All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
@ 2012-04-26 13:34 Michael Tokarev
  2012-04-26 13:44 ` Michael Tokarev
  2012-04-27  0:37 ` Linus Torvalds
  0 siblings, 2 replies; 44+ messages in thread
From: Michael Tokarev @ 2012-04-26 13:34 UTC (permalink / raw)
  To: Linux-kernel
  Cc: Michael Tokarev, Linus Torvalds, autofs, Ian Kent, Thomas Meyer, stable

This patch introduces a new autofs interface version, v6, which
is exactly the same as v5 but uses better defined packet structure
which does not change between architectures or platform bitness.
Old interface is supported still.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org
---
 fs/autofs4/waitq.c       |    6 ++++--
 include/linux/auto_fs4.h |   31 +++++++++++++++++++++++++++----
 2 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index da8876d..97866e8 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -145,7 +145,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 		break;
 	}
 	/*
-	 * Kernel protocol v5 packet for handling indirect and direct
+	 * Kernel protocol v5/v6 packet for handling indirect and direct
 	 * mount missing and expire requests
 	 */
 	case autofs_ptype_missing_indirect:
@@ -155,7 +155,9 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 	{
 		struct autofs_v5_packet *packet = &pkt.v5_pkt.v5_packet;
 
-		pktsz = sizeof(*packet);
+		pktsz = (sbi->version == 5) ?
+			sizeof(struct autofs_v5_bad_packet) :
+			sizeof(struct autofs_v5_packet);
 
 		packet->wait_queue_token = wq->wait_queue_token;
 		packet->len = wq->name.len;
diff --git a/include/linux/auto_fs4.h b/include/linux/auto_fs4.h
index e02982f..fdef141 100644
--- a/include/linux/auto_fs4.h
+++ b/include/linux/auto_fs4.h
@@ -20,11 +20,11 @@
 #undef AUTOFS_MIN_PROTO_VERSION
 #undef AUTOFS_MAX_PROTO_VERSION
 
-#define AUTOFS_PROTO_VERSION		5
+#define AUTOFS_PROTO_VERSION		6
 #define AUTOFS_MIN_PROTO_VERSION	3
-#define AUTOFS_MAX_PROTO_VERSION	5
+#define AUTOFS_MAX_PROTO_VERSION	6
 
-#define AUTOFS_PROTO_SUBVERSION		2
+#define AUTOFS_PROTO_SUBVERSION		0
 
 /* Mask for expire behaviour */
 #define AUTOFS_EXP_IMMEDIATE		1
@@ -126,7 +126,7 @@ union autofs_packet_union {
 	struct autofs_packet_expire_multi expire_multi;
 };
 
-/* autofs v5 common packet struct */
+/* autofs v5/v6 common packet struct */
 struct autofs_v5_packet {
 	struct autofs_packet_hdr hdr;
 	autofs_wqt_t wait_queue_token;
@@ -138,6 +138,29 @@ struct autofs_v5_packet {
 	__u32 tgid;
 	__u32 len;
 	char name[NAME_MAX+1];
+
+	/* padding to make
+	 *   sizeof(autofs_v5_packet) % sizeof(__u64) == 0
+	 * on all arches */
+	char _pad[4];
+};
+
+/* earlier kernel interface used this structure for v5 packet,
+ * which is exactly the same as autofs_v5_packet above
+ * but does not have padding at the end.  The result was that
+ * sizeof it was 300 bytes on x86-32 and 304 bytes on x86-64.
+ * We keep it for compatibility with older userspace. */
+struct autofs_v5_bad_packet {
+	struct autofs_packet_hdr hdr;
+	autofs_wqt_t wait_queue_token;
+	__u32 dev;
+	__u64 ino;
+	__u32 uid;
+	__u32 gid;
+	__u32 pid;
+	__u32 tgid;
+	__u32 len;
+	char name[NAME_MAX+1];
 };
 
 typedef struct autofs_v5_packet autofs_packet_missing_indirect_t;
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-26 13:34 [PATCH v2] Introduce a version6 of autofs interface, to fix design error Michael Tokarev
@ 2012-04-26 13:44 ` Michael Tokarev
  2012-04-27  0:37 ` Linus Torvalds
  1 sibling, 0 replies; 44+ messages in thread
From: Michael Tokarev @ 2012-04-26 13:44 UTC (permalink / raw)
  To: Linux-kernel; +Cc: Linus Torvalds, autofs, Ian Kent, Thomas Meyer, stable

On 26.04.2012 17:34, Michael Tokarev wrote:
> This patch introduces a new autofs interface version, v6, which
> is exactly the same as v5 but uses better defined packet structure
> which does not change between architectures or platform bitness.
> Old interface is supported still.

I forgot to mention: this should go on top of revert of the
previous attempt to fix it, ie, after reverting commit
a32744d4abae24572eff7269bc17895c41bd0085.

The change does not break old API on which the workaround
in the automount daemon relies since new sizeof(packet)
will be dividable by 8 and hence the old workaround code
in automount (compiled with new header) will not trigger.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-26 13:34 [PATCH v2] Introduce a version6 of autofs interface, to fix design error Michael Tokarev
  2012-04-26 13:44 ` Michael Tokarev
@ 2012-04-27  0:37 ` Linus Torvalds
  2012-04-27  9:45   ` Michael Tokarev
  1 sibling, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27  0:37 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Linux-kernel, autofs, Ian Kent, Thomas Meyer, stable

None of this fixes the fact that current 'systemd' binaries just use
'union autofs_v5_packet_union' and do a sizeof on it.

We're not breaking systemd either. You seem to not care about that,
but as long as you just blindly continue to not care, I'm going to
continue to not care about the patches.

It's that simple.

It is unfortunate that we have this idiocy with different packages
having different breakage due to the stupidity of the v5 packet, but
it's REALITY. A simple revert won't fix it, and saying "and here is
version 6" won't matter one whit to existing packages.

               Linus

On Thu, Apr 26, 2012 at 6:34 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> This patch introduces a new autofs interface version, v6, which
> is exactly the same as v5 but uses better defined packet structure
> which does not change between architectures or platform bitness.
> Old interface is supported still.
>
> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
> Cc: Ian Kent <raven@themaw.net>
> Cc: Thomas Meyer <thomas@m3y3r.de>
> Cc: stable@kernel.org
> ---
>  fs/autofs4/waitq.c       |    6 ++++--
>  include/linux/auto_fs4.h |   31 +++++++++++++++++++++++++++----
>  2 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
> index da8876d..97866e8 100644
> --- a/fs/autofs4/waitq.c
> +++ b/fs/autofs4/waitq.c
> @@ -145,7 +145,7 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>                break;
>        }
>        /*
> -        * Kernel protocol v5 packet for handling indirect and direct
> +        * Kernel protocol v5/v6 packet for handling indirect and direct
>         * mount missing and expire requests
>         */
>        case autofs_ptype_missing_indirect:
> @@ -155,7 +155,9 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
>        {
>                struct autofs_v5_packet *packet = &pkt.v5_pkt.v5_packet;
>
> -               pktsz = sizeof(*packet);
> +               pktsz = (sbi->version == 5) ?
> +                       sizeof(struct autofs_v5_bad_packet) :
> +                       sizeof(struct autofs_v5_packet);
>
>                packet->wait_queue_token = wq->wait_queue_token;
>                packet->len = wq->name.len;
> diff --git a/include/linux/auto_fs4.h b/include/linux/auto_fs4.h
> index e02982f..fdef141 100644
> --- a/include/linux/auto_fs4.h
> +++ b/include/linux/auto_fs4.h
> @@ -20,11 +20,11 @@
>  #undef AUTOFS_MIN_PROTO_VERSION
>  #undef AUTOFS_MAX_PROTO_VERSION
>
> -#define AUTOFS_PROTO_VERSION           5
> +#define AUTOFS_PROTO_VERSION           6
>  #define AUTOFS_MIN_PROTO_VERSION       3
> -#define AUTOFS_MAX_PROTO_VERSION       5
> +#define AUTOFS_MAX_PROTO_VERSION       6
>
> -#define AUTOFS_PROTO_SUBVERSION                2
> +#define AUTOFS_PROTO_SUBVERSION                0
>
>  /* Mask for expire behaviour */
>  #define AUTOFS_EXP_IMMEDIATE           1
> @@ -126,7 +126,7 @@ union autofs_packet_union {
>        struct autofs_packet_expire_multi expire_multi;
>  };
>
> -/* autofs v5 common packet struct */
> +/* autofs v5/v6 common packet struct */
>  struct autofs_v5_packet {
>        struct autofs_packet_hdr hdr;
>        autofs_wqt_t wait_queue_token;
> @@ -138,6 +138,29 @@ struct autofs_v5_packet {
>        __u32 tgid;
>        __u32 len;
>        char name[NAME_MAX+1];
> +
> +       /* padding to make
> +        *   sizeof(autofs_v5_packet) % sizeof(__u64) == 0
> +        * on all arches */
> +       char _pad[4];
> +};
> +
> +/* earlier kernel interface used this structure for v5 packet,
> + * which is exactly the same as autofs_v5_packet above
> + * but does not have padding at the end.  The result was that
> + * sizeof it was 300 bytes on x86-32 and 304 bytes on x86-64.
> + * We keep it for compatibility with older userspace. */
> +struct autofs_v5_bad_packet {
> +       struct autofs_packet_hdr hdr;
> +       autofs_wqt_t wait_queue_token;
> +       __u32 dev;
> +       __u64 ino;
> +       __u32 uid;
> +       __u32 gid;
> +       __u32 pid;
> +       __u32 tgid;
> +       __u32 len;
> +       char name[NAME_MAX+1];
>  };
>
>  typedef struct autofs_v5_packet autofs_packet_missing_indirect_t;
> --
> 1.7.10
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27  0:37 ` Linus Torvalds
@ 2012-04-27  9:45   ` Michael Tokarev
  2012-04-27 15:47     ` Mark Lord
                       ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Michael Tokarev @ 2012-04-27  9:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux-kernel, autofs, Ian Kent, Thomas Meyer, stable

On 27.04.2012 04:37, Linus Torvalds wrote:
> None of this fixes the fact that current 'systemd' binaries just use
> 'union autofs_v5_packet_union' and do a sizeof on it.
> 
> We're not breaking systemd either. You seem to not care about that,
> but as long as you just blindly continue to not care, I'm going to
> continue to not care about the patches.

I posted an alternative patch that "fixes" this issue for both
old systemd and old autofs, by checking for current->comm being
"automount". Not nice solution but it should work in practice.

It is not that I don't _care_ about systemd, I just think that
_if_ I'd choose from two evils, I'd pick the one which is less --
note the _if_.

Please note: the talk is about 32bit userspace on 64bit kernel,
which should be quite rare these days, or something of less
priority.

And please note also that systemd developers are saying they're
fine with new interface and non-working old binaries of systemd,
as long as this new interface actually solves the problem.

> It's that simple.
> 
> It is unfortunate that we have this idiocy with different packages
> having different breakage due to the stupidity of the v5 packet, but
> it's REALITY. A simple revert won't fix it, and saying "and here is
> version 6" won't matter one whit to existing packages.

There are always tradeoffs, and we have to pick something.  _IF_
(again, note the if) we have to break Something because no other
solution to actual existing problem is found, this breaking have
to be done.

That's reality too, and _current_ reality we have _now_ is that
autofs package, which worked fine for many years before, is broken.
While systemd actually NEVER worked in this context so far, except
of 3.3 kernel which included the fix.

So this is not something about me caring or not, it is about the
situation we're in: we had working autofs and non-working systemd,
now it is the opposite.  I just propose to have a bit less ugly
fix in kernel and have working autofs as before and working NEW
systemd.  To me that's  the least of two evils.

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27  9:45   ` Michael Tokarev
@ 2012-04-27 15:47     ` Mark Lord
  2012-04-27 20:37       ` H. Peter Anvin
  2012-04-27 16:22     ` David Miller
  2012-04-27 18:19     ` Linus Torvalds
  2 siblings, 1 reply; 44+ messages in thread
From: Mark Lord @ 2012-04-27 15:47 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Linus Torvalds, Linux-kernel, autofs, Ian Kent, Thomas Meyer, stable

On 12-04-27 05:45 AM, Michael Tokarev wrote:
..
> Please note: the talk is about 32bit userspace on 64bit kernel,
> which should be quite rare these days, or something of less
> priority.


I suspect 32bit-PAE is more common than 32/64,
but both are valid and not uncommon in the wild.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27  9:45   ` Michael Tokarev
  2012-04-27 15:47     ` Mark Lord
@ 2012-04-27 16:22     ` David Miller
  2012-04-27 17:10       ` Michael Tokarev
  2012-04-27 18:19     ` Linus Torvalds
  2 siblings, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 16:22 UTC (permalink / raw)
  To: mjt; +Cc: torvalds, linux-kernel, autofs, raven, thomas, stable

From: Michael Tokarev <mjt@tls.msk.ru>
Date: Fri, 27 Apr 2012 13:45:30 +0400

> Please note: the talk is about 32bit userspace on 64bit kernel,
> which should be quite rare these days, or something of less
> priority.

You're talking about %99 of powerpc and sparc systems.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 16:22     ` David Miller
@ 2012-04-27 17:10       ` Michael Tokarev
  2012-04-27 17:28         ` David Miller
  0 siblings, 1 reply; 44+ messages in thread
From: Michael Tokarev @ 2012-04-27 17:10 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, linux-kernel, autofs, raven, thomas, stable

On 27.04.2012 20:22, David Miller wrote:
> From: Michael Tokarev <mjt@tls.msk.ru>
> Date: Fri, 27 Apr 2012 13:45:30 +0400
> 
>> Please note: the talk is about 32bit userspace on 64bit kernel,
>> which should be quite rare these days, or something of less
>> priority.
> 
> You're talking about %99 of powerpc and sparc systems.

Nope.  Only about x86 32 on 64 bits, and only with 3.3 kernel.
The original problem does not exist on ppc, at least as far as
I understand.

/mjt


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 17:10       ` Michael Tokarev
@ 2012-04-27 17:28         ` David Miller
  0 siblings, 0 replies; 44+ messages in thread
From: David Miller @ 2012-04-27 17:28 UTC (permalink / raw)
  To: mjt; +Cc: torvalds, linux-kernel, autofs, raven, thomas, stable

From: Michael Tokarev <mjt@tls.msk.ru>
Date: Fri, 27 Apr 2012 21:10:55 +0400

> On 27.04.2012 20:22, David Miller wrote:
>> From: Michael Tokarev <mjt@tls.msk.ru>
>> Date: Fri, 27 Apr 2012 13:45:30 +0400
>> 
>>> Please note: the talk is about 32bit userspace on 64bit kernel,
>>> which should be quite rare these days, or something of less
>>> priority.
>> 
>> You're talking about %99 of powerpc and sparc systems.
> 
> Nope.  Only about x86 32 on 64 bits, and only with 3.3 kernel.
> The original problem does not exist on ppc, at least as far as
> I understand.

I was making a general point about 32-bit userspace on a 64-bit
kernel, but yes for this specific bug the issue doesn't exist.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27  9:45   ` Michael Tokarev
  2012-04-27 15:47     ` Mark Lord
  2012-04-27 16:22     ` David Miller
@ 2012-04-27 18:19     ` Linus Torvalds
  2012-04-27 18:34       ` David Miller
  2012-04-27 20:42       ` H. Peter Anvin
  2 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 18:19 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Linux-kernel, autofs, Ian Kent, Thomas Meyer, stable

On Fri, Apr 27, 2012 at 2:45 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> I posted an alternative patch that "fixes" this issue for both
> old systemd and old autofs, by checking for current->comm being
> "automount". Not nice solution but it should work in practice.

Yes, I agree, it's probably what we have to go with.

I really would prefer to match on something else than ->comm, though.
I would hope that there is some *behavioral* difference between autofs
and systemd that we could trigger on. That said, it's always going to
be conditional on the whole "running 32-bit daemon on 64-bit kernel",
so whatever solution we pick will at least always be self-limiting to
that specific nasty buglet case.

> And please note also that systemd developers are saying they're
> fine with new interface and non-working old binaries of systemd,
> as long as this new interface actually solves the problem.

So the thing is, I think that a new "version 6" is worthless, if the
only problem it solves is this one - since I feel that we have to fix
the binary compatibility issue regardless.

So to me, a version 6 with just this fix is just completely pointless.
We can't just ignore existing binaries. You don't like ignoring
existing 'automount' binaries, and Thomas doesn't like ignoring
existing 'systemd' binaries. And I don't like ignoring *any* binaries
that are shipped with distributions - especially when they are central
and hard to work around.

> That's reality too, and _current_ reality we have _now_ is that
> autofs package, which worked fine for many years before, is broken.
> While systemd actually NEVER worked in this context so far, except
> of 3.3 kernel which included the fix.

Oh, I agree. We need to fix the autofs package. But reverting isn't
going to be acceptable either.

The strcmp is actually something I can accept, but I'd want it
separated out in a nicer helper function with a big comment. And quite
frankly, I'd really prefer to be able to trigger on some other
behavior - like looking at the exact details of the mount() system
call or maybe the ioctl's that get done on the control fd.

But yes, the strcmp is ugly, but it's less ugly than saying "we won't
fix this, and instead we'll create a whole new pointless protocol
version".

Btw, the whole autofs protocol is *full* of stuff like this. I just
looked at some other places where the automount daemon does reads of
fixed sizes, and one of them is a "sizeof(enum states)". Doing a
sizeof() on an enum is a f*cking bad idea - it's not very well-defined
at all (different compilers will consider the enum different sizes -
seriously). But at least that one seems to be something that is purely
internal to autofs - but it does show that the people involved did not
think through and design the protocols they used in general - more of
these kinds of "random sizes of random data structures that we don't
understand".

                         Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:19     ` Linus Torvalds
@ 2012-04-27 18:34       ` David Miller
  2012-04-27 18:42         ` Linus Torvalds
  2012-04-27 20:42       ` H. Peter Anvin
  1 sibling, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 18:34 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2012 11:19:13 -0700

> So to me, a version 6 with just this fix is just completely pointless.
> We can't just ignore existing binaries. You don't like ignoring
> existing 'automount' binaries, and Thomas doesn't like ignoring
> existing 'systemd' binaries. And I don't like ignoring *any* binaries
> that are shipped with distributions - especially when they are central
> and hard to work around.

I think what we did was break a kernel exported interface which had
5 years of precedence.  Ugly, or broken, it was the state of affairs
and userland did embrace it. :-)

If systemd wants to use version 5 of this thing, it has to have the
same workaround code automountd has.  It is systemd (and now the
kernel) which is broken.

We should have never touched version 5 of the data-structure.  Having
5 years of workaround precedence in userspace proves this.

And we should create a version 6 as has been proposed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:34       ` David Miller
@ 2012-04-27 18:42         ` Linus Torvalds
  2012-04-27 18:55           ` Linus Torvalds
  2012-04-27 19:08           ` David Miller
  0 siblings, 2 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 18:42 UTC (permalink / raw)
  To: David Miller; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 11:34 AM, David Miller <davem@davemloft.net> wrote:
>
> If systemd wants to use version 5 of this thing, it has to have the
> same workaround code automountd has.  It is systemd (and now the
> kernel) which is broken.

There's no question that systemd is broken.

But those broken binaries are out in the wild. End of story.

> We should have never touched version 5 of the data-structure.  Having
> 5 years of workaround precedence in userspace proves this.

Umm. The thing is, those broken binaries *work* on 32-bit.  They were
*tested* on 32-bit. They were *shipped* on 32-bit.

And we really do want to make it just work for people to build 64-bit
kernels. It is simply not acceptably to say "hey, you have 32-bit user
land, so use a 32-bit kernel". It's not a different ABI, never has
been.

It's entirely pointless to talk about some next version. THAT FIXES
NOTHING. What's so hard to understand? This is an existing problem,
and making a v6 DOES NOT FIX ANYTHING AT ALL.

And people who bring it up as a "fix" are entirely missing the point.

Seriously.

                      Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:42         ` Linus Torvalds
@ 2012-04-27 18:55           ` Linus Torvalds
  2012-04-27 19:14             ` David Miller
  2012-04-27 19:08           ` David Miller
  1 sibling, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 18:55 UTC (permalink / raw)
  To: David Miller; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 11:42 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> There's no question that systemd is broken.

Actually, I'll take that back.

Yes, systemd has breakage. But it's actually automount that is the
truly broken piece of sh*t.

I think that 'automount' is even more broken. The fact that the
automount maintainers knew about this, and added TOTALLY BROKEN code
to their automount source tree, over five years ago, because the
authors clearly did not understand what the f*ck they were doing,
that's the real problem.

Look at the automount source code:

        if (pkt_len % 8) {
                if (strcmp(un.machine, "alpha") == 0 ||
                    strcmp(un.machine, "ia64") == 0 ||
                    strcmp(un.machine, "x86_64") == 0 ||
                    strcmp(un.machine, "ppc64") == 0)
                        pkt_len += 4;

        }

and realize that the above is *wrong*. It's complete and utter shit.
It's actively insane. The above source code makes zero sense.

The whole issue does not *exist* on alpha, ia64, or ppc64 (or sparc64,
which isn't even listed), because those architectures don't have the
issue that their 32-bit version has different default alignment for a
64-bit entity.

So autofs-tools was and remains actively *wrong*.

In contrast, the systemd source code actually makes sense. We should
point to it as the *correct* user, and autofs as the legacy bug.

Which is why I actually think that it makes more sense to special-case
the real bug - in automount. It's why I think that it's ok to do the
strcmp(), and why the strcmp should check for "automount", not
"systemd".

I just think it needs a big honking comment that explains it. Something like

   int autofs_compat_interface()
   {
      if (!is_compat_task())
            return 0;
      /* "autofs" has crazy shit going on working around an old compat
bug, and actually
          expects the non-compat size with a 64-bit kernel */
      if (!strcmp(current->comm, "automount"))
            return 0;
      return 1;
   }

because it really is an autofs bug.

If the autofs people had understood what they were doing, none of this
would ever have happened.

Still, checking for strings is ugly. I do think it would be even
better if we could trigger on another difference, but whatever..

                    Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:42         ` Linus Torvalds
  2012-04-27 18:55           ` Linus Torvalds
@ 2012-04-27 19:08           ` David Miller
  2012-04-27 20:45             ` H. Peter Anvin
  1 sibling, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 19:08 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2012 11:42:58 -0700

> On Fri, Apr 27, 2012 at 11:34 AM, David Miller <davem@davemloft.net> wrote:
>>
>> If systemd wants to use version 5 of this thing, it has to have the
>> same workaround code automountd has.  It is systemd (and now the
>> kernel) which is broken.
> 
> There's no question that systemd is broken.
> 
> But those broken binaries are out in the wild. End of story.

And my point is that automountd was out there longer, and encoded
the "broken" structure size into the resulting binaries.

>> We should have never touched version 5 of the data-structure.  Having
>> 5 years of workaround precedence in userspace proves this.
> 
> Umm. The thing is, those broken binaries *work* on 32-bit.  They were
> *tested* on 32-bit. They were *shipped* on 32-bit.

As were the automount binaries.

systemd coded to an interface which did not exist in reality, and if
they had used the automountd sources as a reference (the only other
user of this interface) they would have seen this.

Look, I can almost guarantee that whoever wrote the automountd
workaround code looked at this situation and said "yeah, it's
impossible to get this right in the kernel for all cases for v5, so
let's just do it where we _can_ be absolutely certain and that's here
in this userland routine doing the autofs stuff"

And you know what, whoever that guy was, he was right.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:55           ` Linus Torvalds
@ 2012-04-27 19:14             ` David Miller
  2012-04-27 19:16               ` David Miller
  2012-04-28  1:56               ` Ian Kent
  0 siblings, 2 replies; 44+ messages in thread
From: David Miller @ 2012-04-27 19:14 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2012 11:55:12 -0700

> On Fri, Apr 27, 2012 at 11:42 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> There's no question that systemd is broken.
> 
> Actually, I'll take that back.
> 
> Yes, systemd has breakage. But it's actually automount that is the
> truly broken piece of sh*t.
> 
> I think that 'automount' is even more broken. The fact that the
> automount maintainers knew about this, and added TOTALLY BROKEN code
> to their automount source tree, over five years ago, because the
> authors clearly did not understand what the f*ck they were doing,
> that's the real problem.

I respectfully disagree.

It's ugly as shit, but it is the only one place where one can be
absolutely sure that we are dealing with a pipe passing those v5
things around.

All these hacks we have been talking about, assuming the mount means
the pipe is for passing structure so-and-so around, and now trying
to find some other check such as one on current->comm...

That's better?

Only the application really knows.  And I bet the person who wrote
that automountd code you find so distasteful analyzed this and came
to realize how difficult the kernel side would be to get right.

So they decided that the datastructure on 32-bit changes based upon
whether we run on a 64-bit kernel or not, that's the interface the
kernel has always exported, and therefore that's the interface it
choose to use.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:14             ` David Miller
@ 2012-04-27 19:16               ` David Miller
  2012-04-27 19:19                 ` Linus Torvalds
  2012-04-28  1:56               ` Ian Kent
  1 sibling, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 19:16 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: David Miller <davem@davemloft.net>
Date: Fri, 27 Apr 2012 15:14:33 -0400 (EDT)

> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 27 Apr 2012 11:55:12 -0700
> 
>> On Fri, Apr 27, 2012 at 11:42 AM, Linus Torvalds
>> <torvalds@linux-foundation.org> wrote:
>>>
>>> There's no question that systemd is broken.
>> 
>> Actually, I'll take that back.
>> 
>> Yes, systemd has breakage. But it's actually automount that is the
>> truly broken piece of sh*t.
>> 
>> I think that 'automount' is even more broken. The fact that the
>> automount maintainers knew about this, and added TOTALLY BROKEN code
>> to their automount source tree, over five years ago, because the
>> authors clearly did not understand what the f*ck they were doing,
>> that's the real problem.
> 
> I respectfully disagree.

BTW, I want to clarify that my position is that userland might have
been the best place to handle this rather than the kernel.

I fully recognize that the automountd test is busted on ppc, sparc,
et al. 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:16               ` David Miller
@ 2012-04-27 19:19                 ` Linus Torvalds
  2012-04-27 19:24                   ` David Miller
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 19:19 UTC (permalink / raw)
  To: David Miller; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 12:16 PM, David Miller <davem@davemloft.net> wrote:
>
> BTW, I want to clarify that my position is that userland might have
> been the best place to handle this rather than the kernel.

It definitely is not.

You are ignoring the fact that the user-land workaround resulted in an
independent bug in *another* user-land project.

Your whole argument was apparently that "that point in user-land is
the only place that knows the details". But that argument is bogus
crap - and we can *see* that it is bogus crap by the fact that there
was *another* user-land place that had this problem, but didn't
realize it.

That's why fixing it in user land was always wrong. There isn't just a
single user of this interface!

Btw, I think I may have an alternate fix. Which is to simply add a
"packetized" mode to pipes.

                   Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:19                 ` Linus Torvalds
@ 2012-04-27 19:24                   ` David Miller
  2012-04-27 19:56                     ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 19:24 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2012 12:19:56 -0700

> Btw, I think I may have an alternate fix. Which is to simply add a
> "packetized" mode to pipes.

If you can pull that off, it would certainly work and be the best
solution.

BTW, this has happened before, we had this same problem passing data
over netlink sockets.  And since it's packetized already we had a
slightly easier time dealing with it.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:24                   ` David Miller
@ 2012-04-27 19:56                     ` Linus Torvalds
  2012-04-27 20:13                       ` Stef Bon
                                         ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 19:56 UTC (permalink / raw)
  To: David Miller; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

[-- Attachment #1: Type: text/plain, Size: 2337 bytes --]

On Fri, Apr 27, 2012 at 12:24 PM, David Miller <davem@davemloft.net> wrote:
>
> If you can pull that off, it would certainly work and be the best
> solution.
>
> BTW, this has happened before, we had this same problem passing data
> over netlink sockets.  And since it's packetized already we had a
> slightly easier time dealing with it.

Ok, so here's a fairly quick hack.

NOTE! I haven't actually tested this, because I don't use autofs
(well, not knowingly - maybe I have a machine that actually does).

What it does is add a new "packetized" flag to pipes, which does two
simple things:
 - do not merge write data in pipe buffers when writing
 - when reading, consider a pipe buffer to be "spent" when any of the
data has been read

Of course, if you write more than PIPE_BUFFER bytes, the packetization
doesn't work, but hey, if you do that, it's your own damn problem.

Then, this just makes (for debugging!) autofs always write eight extra
bytes of garbage (NOTE NOTE NOTE! This will write uninitialized data,
so this is literally just for debugging this particular issue), so
that if the whole "kernel writes a few bytes extra" approach with pipe
packetization doesn't work, you would see it even on native 32-bit or
64-bit mode, without having to even test the special case of "32-bit
automount binary on a 64-bit kernel". So *anybody* who uses autofs can
test this patch and see if autofs still works for them despite the
extra padding on the write.

So note the "+8" in autofs_v5_packet_size(). It's there on purpose,
but it needs to be removed for actual final results if this works for
testing.

Comments? The patch looks fairly simple. The "packetized pipe" might
even be useful for other users and maybe we might want to expose it as
an actual pipe fcntl, but right now the only thing that sets that flag
is autofs.

Does this even work? I might well have missed some obvious thing, this
was put together pretty quickly, but I think the *concept* may be the
right approach to this whole mess.

(And yes, we could probably just pack the "struct pipe_inode_info"
better. We don't really need 32 bits for the pipe writer count etc.
But to make the patch simple, I just added a whole new bitfield, which
will just grow that silly structure).

                       Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 4469 bytes --]

 fs/autofs4/autofs_i.h     |   11 +++++++++++
 fs/autofs4/dev-ioctl.c    |    2 +-
 fs/autofs4/inode.c        |    2 +-
 fs/autofs4/waitq.c        |   14 ++++++--------
 fs/pipe.c                 |    6 +++---
 include/linux/pipe_fs_i.h |    1 +
 6 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index eb1cc92cd67d..f8642b8a1779 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
+#include <linux/pipe_fs_i.h>
 #include <asm/current.h>
 #include <asm/uaccess.h>
 
@@ -270,6 +271,16 @@ int autofs4_fill_super(struct super_block *, void *, int);
 struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
 void autofs4_clean_ino(struct autofs_info *);
 
+static inline int autofs_prepare_pipe(struct file *pipe)
+{
+	if (!pipe->f_op || !pipe->f_op->write)
+		return -EINVAL;
+	if (!pipe->f_dentry->d_inode->i_pipe)
+		return -EINVAL;
+	pipe->f_dentry->d_inode->i_pipe->packetized = 1;
+	return 0;
+}
+
 /* Queue management functions */
 
 int autofs4_wait(struct autofs_sb_info *,struct dentry *, enum autofs_notify);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 9dacb8586701..6259e7142032 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -376,7 +376,7 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
 			err = -EBADF;
 			goto out;
 		}
-		if (!pipe->f_op || !pipe->f_op->write) {
+		if (autofs_prepare_pipe(pipe) < 0) {
 			err = -EPIPE;
 			fput(pipe);
 			goto out;
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index d8dc002e9cc3..c525b74deefd 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -292,7 +292,7 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent)
 		printk("autofs: could not open pipe file descriptor\n");
 		goto fail_dput;
 	}
-	if (!pipe->f_op || !pipe->f_op->write)
+	if (autofs_prepare_pipe(pipe) < 0)
 		goto fail_fput;
 	sbi->pipe = pipe;
 	sbi->pipefd = pipefd;
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 9c098db43344..5ce5026200b9 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -97,16 +97,14 @@ static int autofs4_write(struct autofs_sb_info *sbi,
  *
  * The packets are identical on x86-32 and x86-64, but have different
  * alignment. Which means that 'sizeof()' will give different results.
- * Fix it up for the case of running 32-bit user mode on a 64-bit kernel.
+ * However, we packetize the pipe, so just use the bigger size, the
+ * pipe read will discard the rest.
+ *
+ * This adds on 8 bytes of garbage FOR DEBUGGING ONLY!
  */
-static noinline size_t autofs_v5_packet_size(struct autofs_sb_info *sbi)
+static inline size_t autofs_v5_packet_size(struct autofs_sb_info *sbi)
 {
-	size_t pktsz = sizeof(struct autofs_v5_packet);
-#if defined(CONFIG_X86_64) && defined(CONFIG_COMPAT)
-	if (sbi->compat_daemon > 0)
-		pktsz -= 4;
-#endif
-	return pktsz;
+	return sizeof(struct autofs_v5_packet)+8;
 }
 
 static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
diff --git a/fs/pipe.c b/fs/pipe.c
index 25feaa3faac0..76febdfafcc3 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -407,7 +407,7 @@ redo:
 			ret += chars;
 			buf->offset += chars;
 			buf->len -= chars;
-			if (!buf->len) {
+			if (!buf->len || pipe->packetized) {
 				buf->ops = NULL;
 				ops->release(pipe, buf);
 				curbuf = (curbuf + 1) & (pipe->buffers - 1);
@@ -416,7 +416,7 @@ redo:
 				do_wakeup = 1;
 			}
 			total_len -= chars;
-			if (!total_len)
+			if (!total_len || pipe->packetized)
 				break;	/* common path: read succeeded */
 		}
 		if (bufs)	/* More to do? */
@@ -490,7 +490,7 @@ pipe_write(struct kiocb *iocb, const struct iovec *_iov,
 
 	/* We try to merge small writes */
 	chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */
-	if (pipe->nrbufs && chars != 0) {
+	if (!pipe->packetized && pipe->nrbufs && chars != 0) {
 		int lastbuf = (pipe->curbuf + pipe->nrbufs - 1) &
 							(pipe->buffers - 1);
 		struct pipe_buffer *buf = pipe->bufs + lastbuf;
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 6d626ff0cfd0..49f034aaca5e 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -43,6 +43,7 @@ struct pipe_buffer {
  **/
 struct pipe_inode_info {
 	wait_queue_head_t wait;
+	unsigned int packetized:1;
 	unsigned int nrbufs, curbuf, buffers;
 	unsigned int readers;
 	unsigned int writers;

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:56                     ` Linus Torvalds
@ 2012-04-27 20:13                       ` Stef Bon
  2012-04-27 20:29                       ` David Miller
                                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Stef Bon @ 2012-04-27 20:13 UTC (permalink / raw)
  To: autofs; +Cc: linux-kernel

I do not have any solution to the issue discussed here, but mentioned
is that automount and systemd are the only packages using autofs. I'm
thinking about creating a FUSE filesystem which will make use of
autofs (direct) mounting, where the FUSE fs will do something what now
automount does.  Indirect mounts are not necessary anymore, the
browsable maps will be created in the FUSE fs.

Stef Bon

2012/4/27 Linus Torvalds <torvalds@linux-foundation.org>:
> On Fri, Apr 27, 2012 at 12:24 PM, David Miller <davem@davemloft.net> wrote:
>>
>> If you can pull that off, it would certainly work and be the best
>> solution.
>>
>> BTW, this has happened before, we had this same problem passing data
>> over netlink sockets.  And since it's packetized already we had a
>> slightly easier time dealing with it.
>
> Ok, so here's a fairly quick hack.
>
> NOTE! I haven't actually tested this, because I don't use autofs
> (well, not knowingly - maybe I have a machine that actually does).
>
> What it does is add a new "packetized" flag to pipes, which does two
> simple things:
>  - do not merge write data in pipe buffers when writing
>  - when reading, consider a pipe buffer to be "spent" when any of the
> data has been read
>
> Of course, if you write more than PIPE_BUFFER bytes, the packetization
> doesn't work, but hey, if you do that, it's your own damn problem.
>
> Then, this just makes (for debugging!) autofs always write eight extra
> bytes of garbage (NOTE NOTE NOTE! This will write uninitialized data,
> so this is literally just for debugging this particular issue), so
> that if the whole "kernel writes a few bytes extra" approach with pipe
> packetization doesn't work, you would see it even on native 32-bit or
> 64-bit mode, without having to even test the special case of "32-bit
> automount binary on a 64-bit kernel". So *anybody* who uses autofs can
> test this patch and see if autofs still works for them despite the
> extra padding on the write.
>
> So note the "+8" in autofs_v5_packet_size(). It's there on purpose,
> but it needs to be removed for actual final results if this works for
> testing.
>
> Comments? The patch looks fairly simple. The "packetized pipe" might
> even be useful for other users and maybe we might want to expose it as
> an actual pipe fcntl, but right now the only thing that sets that flag
> is autofs.
>
> Does this even work? I might well have missed some obvious thing, this
> was put together pretty quickly, but I think the *concept* may be the
> right approach to this whole mess.
>
> (And yes, we could probably just pack the "struct pipe_inode_info"
> better. We don't really need 32 bits for the pipe writer count etc.
> But to make the patch simple, I just added a whole new bitfield, which
> will just grow that silly structure).
>
>                       Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:56                     ` Linus Torvalds
  2012-04-27 20:13                       ` Stef Bon
@ 2012-04-27 20:29                       ` David Miller
  2012-04-27 22:40                         ` Linus Torvalds
  2012-04-27 20:43                       ` H. Peter Anvin
  2012-04-27 22:42                       ` Alan Cox
  3 siblings, 1 reply; 44+ messages in thread
From: David Miller @ 2012-04-27 20:29 UTC (permalink / raw)
  To: torvalds; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 27 Apr 2012 12:56:20 -0700

>  - when reading, consider a pipe buffer to be "spent" when any of the
> data has been read

So basically any sized read empties the entire pipe, right?

As long as we never generate multiple messages at the same
time, I guess this would work.

Otherwise we can't use this approach without some adjustments.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 15:47     ` Mark Lord
@ 2012-04-27 20:37       ` H. Peter Anvin
  2012-04-28 22:20         ` Mark Lord
  0 siblings, 1 reply; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-27 20:37 UTC (permalink / raw)
  To: Mark Lord
  Cc: Michael Tokarev, Linus Torvalds, Linux-kernel, autofs, Ian Kent,
	Thomas Meyer, stable

On 04/27/2012 08:47 AM, Mark Lord wrote:
> On 12-04-27 05:45 AM, Michael Tokarev wrote:
> ..
>> Please note: the talk is about 32bit userspace on 64bit kernel,
>> which should be quite rare these days, or something of less
>> priority.
> 
> 
> I suspect 32bit-PAE is more common than 32/64,
> but both are valid and not uncommon in the wild.

32-on-64 is a very important use case, because it lets us get people off
32-bit kernels.

	-hpa


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 18:19     ` Linus Torvalds
  2012-04-27 18:34       ` David Miller
@ 2012-04-27 20:42       ` H. Peter Anvin
  1 sibling, 0 replies; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-27 20:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Michael Tokarev, Linux-kernel, autofs, Ian Kent, Thomas Meyer, stable

On 04/27/2012 11:19 AM, Linus Torvalds wrote:
> 
> Btw, the whole autofs protocol is *full* of stuff like this. I just
> looked at some other places where the automount daemon does reads of
> fixed sizes, and one of them is a "sizeof(enum states)". Doing a
> sizeof() on an enum is a f*cking bad idea - it's not very well-defined
> at all (different compilers will consider the enum different sizes -
> seriously). But at least that one seems to be something that is purely
> internal to autofs - but it does show that the people involved did not
> think through and design the protocols they used in general - more of
> these kinds of "random sizes of random data structures that we don't
> understand".
> 

The really *really* damning thing with the v5 structure is that it
padded out the whole structure so it wouldn't have to do two read()
operations.  There is a header with a length field in it (which is still
there), and the daemon just ignores it...

	-hpa


P.S. This can still be fixed in user space, by reading the shorter
length, and then eating any additional zeroes.  The kernel does zero out
the extra pad.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:56                     ` Linus Torvalds
  2012-04-27 20:13                       ` Stef Bon
  2012-04-27 20:29                       ` David Miller
@ 2012-04-27 20:43                       ` H. Peter Anvin
  2012-04-27 22:42                         ` Linus Torvalds
  2012-04-27 22:42                       ` Alan Cox
  3 siblings, 1 reply; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-27 20:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On 04/27/2012 12:56 PM, Linus Torvalds wrote:
> 
> What it does is add a new "packetized" flag to pipes, which does two
> simple things:
>  - do not merge write data in pipe buffers when writing
>  - when reading, consider a pipe buffer to be "spent" when any of the
> data has been read
> 

Why not just use the already existing SOCK_SEQPACKET sockets?

	-hpa


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:08           ` David Miller
@ 2012-04-27 20:45             ` H. Peter Anvin
  0 siblings, 0 replies; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-27 20:45 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, mjt, linux-kernel, autofs, raven, thomas, stable

On 04/27/2012 12:08 PM, David Miller wrote:
>>
>> Umm. The thing is, those broken binaries *work* on 32-bit.  They were
>> *tested* on 32-bit. They were *shipped* on 32-bit.
> 
> As were the automount binaries.
> 
> systemd coded to an interface which did not exist in reality, and if
> they had used the automountd sources as a reference (the only other
> user of this interface) they would have seen this.
> 
> Look, I can almost guarantee that whoever wrote the automountd
> workaround code looked at this situation and said "yeah, it's
> impossible to get this right in the kernel for all cases for v5, so
> let's just do it where we _can_ be absolutely certain and that's here
> in this userland routine doing the autofs stuff"
> 
> And you know what, whoever that guy was, he was right.

The workaround in automount is a mistake too, though.  The better
solution would have been to swallow the extra zero field; data driven.

	-hpa


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 20:29                       ` David Miller
@ 2012-04-27 22:40                         ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 22:40 UTC (permalink / raw)
  To: David Miller; +Cc: mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 1:29 PM, David Miller <davem@davemloft.net> wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 27 Apr 2012 12:56:20 -0700
>
>>  - when reading, consider a pipe buffer to be "spent" when any of the
>> data has been read
>
> So basically any sized read empties the entire pipe, right?
>
> As long as we never generate multiple messages at the same
> time, I guess this would work.

It would work regardless.

Read the notes (or the small patch) again. Reading a part of a pipe
buffer clears *that* buffer. And since the packetized mode also
disables merging of writes into buffers, that means that multiple
writes result in multiple buffers. By default, pipes have 16 buffers
(you can set the number if you really want to), so you can do multiple
writes of multiple packets, and then as you read them, you'll see each
write individually.

So let's say that you have a writer that does the following writes: 8
bytes, 100 bytes, 32 bytes.  If you have a reader that reads a
fixed-size buffer (let's pick 64 bytes just to make an example), the
reader will then see those exact three writes, except it will only get
the 64 first bytes of the 100-byte write (and the remaining 36 bytes
will be lost forever).

So it really is packetized.

Of course, if you try to write more than PAGE_SIZE in one single
write, the pipe code will split that up into multiple buffers, so then
a reader would see that as multiple "packets". But hey, that's how
pipes work - they have a size limit on write atomicity guarantees,
regardless of the packetized mode. And obviously right now we only
expose the packetized mode for the autofs pipe, which only does writes
of that one fixed size (~300 bytes).

                     Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:56                     ` Linus Torvalds
                                         ` (2 preceding siblings ...)
  2012-04-27 20:43                       ` H. Peter Anvin
@ 2012-04-27 22:42                       ` Alan Cox
  2012-04-27 22:49                         ` Linus Torvalds
  2012-04-27 23:27                         ` Linus Torvalds
  3 siblings, 2 replies; 44+ messages in thread
From: Alan Cox @ 2012-04-27 22:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

> Comments? The patch looks fairly simple. The "packetized pipe" might
> even be useful for other users and maybe we might want to expose it as
> an actual pipe fcntl, but right now the only thing that sets that flag
> is autofs.

The obvious way to set the flag would seem to me to also btake O_DIRECT
on the pipe as meaning this ?

Alan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 20:43                       ` H. Peter Anvin
@ 2012-04-27 22:42                         ` Linus Torvalds
  2012-04-27 22:56                           ` H. Peter Anvin
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 22:42 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 1:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Why not just use the already existing SOCK_SEQPACKET sockets?

Have you looked at the interface?

The file descriptor is created in user space, and is defined to be a
pipe. So both automount and systemd create a pipe, and then pass that
pipe fd to the mount system call.

So it is not autofs that creates the file descriptor for the user.
It's the other way around: the user creates (using "pipe()") the file
descriptor, and passes it to autofs.

                        Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 22:42                       ` Alan Cox
@ 2012-04-27 22:49                         ` Linus Torvalds
  2012-04-27 23:27                         ` Linus Torvalds
  1 sibling, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 22:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 3:42 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> Comments? The patch looks fairly simple. The "packetized pipe" might
>> even be useful for other users and maybe we might want to expose it as
>> an actual pipe fcntl, but right now the only thing that sets that flag
>> is autofs.
>
> The obvious way to set the flag would seem to me to also btake O_DIRECT
> on the pipe as meaning this ?

Hmm. That would work, then if you wanted to create a packet pipe, you'd just use

   if (pipe2(fd, O_DIRECT | O_NONBLOCK)) {
     perror("Kernel doesn't support packetized pipes");
     return -1;
  }

which means that if user space wants to use packetized pipes for other
things, it now has a nice way of testing whether the kernel supports
it (because older kernels would return -EINVAL - for once we actually
verify that only the flags we support are set).

And that maybe we wouldn't even need the extra flag in the "struct
pipe" at all - we could just check it in file->f_flags.

Let my try that.

                   Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 22:42                         ` Linus Torvalds
@ 2012-04-27 22:56                           ` H. Peter Anvin
  2012-04-27 23:07                             ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-27 22:56 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On 04/27/2012 03:42 PM, Linus Torvalds wrote:
> 
> Have you looked at the interface?
> 

Yes.  I designed and implemented the v1-3 versions of the interface.
v4-5 has kept the pipe, but at least v5 dropped the reading of the
header with the included length as a separate operation.

> The file descriptor is created in user space, and is defined to be a
> pipe. So both automount and systemd create a pipe, and then pass that
> pipe fd to the mount system call.
> 
> So it is not autofs that creates the file descriptor for the user.
> It's the other way around: the user creates (using "pipe()") the file
> descriptor, and passes it to autofs.

I would have used SOCK_SEQPACKET if it had existed today, and so if
there is going to be a change in the interface I would use it.

I guess, though, what you're saying is to change the semantic of the
existing pipe to return short reads on the receive end.  That won't work
since the daemon code is written to loop back and read more if it
doesn't fill the buffer, so it "swallows the comma."  Hence it's a new
interface no matter how you slice it.

	-hpa



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 22:56                           ` H. Peter Anvin
@ 2012-04-27 23:07                             ` Linus Torvalds
  2012-04-28  0:03                               ` H. Peter Anvin
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 23:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 3:56 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> I guess, though, what you're saying is to change the semantic of the
> existing pipe to return short reads on the receive end.

The important change is actually to make the read return the size requested.

So broken user space does a read() with the wrong size - and then
checks that it gets *exactly* that many bytes. Not more, not less.

The way to handle that is to
 - make sure the kernel always writes the maximally padded data
 - make the packetization simply drop any data that was in the packet
that the reader didn't ask for.

This is very much a semantic change, in that any client that tries to
read the packet with multiple reads (one 4-byte read to see the size,
followed by one "right-sized" read of the data) would be totally
screwed. The first read would indeed read the size, but it also -
because of the packetized interface - would simply drop the data, and
the next read would read the first bytes of the next packet.

But that's not what the autofs users actually do anyway. They just
read the whole packet.  So we can make *them* work. And the new
interface will be fairly robust (in fact, you could pass it some big
buffer and just know you always get exactly one packet, and never have
that whole stupid "sizeof()" at all).

                           Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 22:42                       ` Alan Cox
  2012-04-27 22:49                         ` Linus Torvalds
@ 2012-04-27 23:27                         ` Linus Torvalds
  2012-04-28 16:10                           ` Linus Torvalds
  1 sibling, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-27 23:27 UTC (permalink / raw)
  To: Alan Cox; +Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

[-- Attachment #1: Type: text/plain, Size: 1708 bytes --]

On Fri, Apr 27, 2012 at 3:42 PM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>
> The obvious way to set the flag would seem to me to also btake O_DIRECT
> on the pipe as meaning this ?

Ok, that really does seem to be pretty natural.

The attached patch implements that, and makes autofs automatically set
the O_DIRECT flag. And now that it is exposed to user-space, here's a
test-program that uses it:

  [torvalds@i5 ~]$ cat packet.c
  #define _GNU_SOURCE
  #include <unistd.h>
  #include <fcntl.h>
  #include <stdio.h>
  #include <stdlib.h>

  #define REPORT(op,fd,size) do { 			\
  	int ret = op(fd, buffer, size);			\
  	fprintf(stderr, #op "(%d)=%d\n", size, ret);	\
  } while (0)

  #define WRITE(x) REPORT(write,fd[1],x)
  #define READ(x) REPORT(read,fd[0],x)

  int main(void)
  {
  	int fd[2];
  	static char buffer[1024];

  	if (pipe2(fd, O_DIRECT)) {
  		perror("pipe2 O_DIRECT");
  		exit(1);
  	}
  	WRITE(8);
  	WRITE(12);
  	WRITE(300);
  	WRITE(5);
  	READ(64);
  	READ(64);
  	WRITE(7);
  	READ(64);
  	READ(64);
  	READ(64);
  	return 0;
  }

which results in this output:

  [torvalds@i5 ~]$ ./a.out
  write(8)=8
  write(12)=12
  write(300)=300
  write(5)=5
  read(64)=8
  read(64)=12
  write(7)=7
  read(64)=64
  read(64)=5
  read(64)=7

iow, notice how all the read() calls return individual packets (and
the 300-byte packet was truncated to 64 bytes).

HOWEVER. I haven't actually tested the autofs part. But I suspect it
"should just work".

Thomas, Michael, can you test whether this patch (again: note the "+8"
in autofs_v5_packet_size() purely for debug purposes, so this would be
interesting to test for other people too) makes your setups work?

                           Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 5162 bytes --]

 fs/autofs4/autofs_i.h  |   12 ++++++++++++
 fs/autofs4/dev-ioctl.c |    2 +-
 fs/autofs4/inode.c     |    2 +-
 fs/autofs4/waitq.c     |   14 ++++++--------
 fs/pipe.c              |   17 +++++++++++------
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index eb1cc92cd67d..a07885bf0463 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -34,6 +34,7 @@
 #include <linux/sched.h>
 #include <linux/mount.h>
 #include <linux/namei.h>
+#include <linux/pipe_fs_i.h>
 #include <asm/current.h>
 #include <asm/uaccess.h>
 
@@ -270,6 +271,17 @@ int autofs4_fill_super(struct super_block *, void *, int);
 struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
 void autofs4_clean_ino(struct autofs_info *);
 
+static inline int autofs_prepare_pipe(struct file *pipe)
+{
+	if (!pipe->f_op || !pipe->f_op->write)
+		return -EINVAL;
+	if (!pipe->f_dentry->d_inode->i_pipe)
+		return -EINVAL;
+	/* We want a packet pipe */
+	pipe->f_flags |= O_DIRECT;
+	return 0;
+}
+
 /* Queue management functions */
 
 int autofs4_wait(struct autofs_sb_info *,struct dentry *, enum autofs_notify);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 9dacb8586701..6259e7142032 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -376,7 +376,7 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
 			err = -EBADF;
 			goto out;
 		}
-		if (!pipe->f_op || !pipe->f_op->write) {
+		if (autofs_prepare_pipe(pipe) < 0) {
 			err = -EPIPE;
 			fput(pipe);
 			goto out;
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index d8dc002e9cc3..c525b74deefd 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -292,7 +292,7 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent)
 		printk("autofs: could not open pipe file descriptor\n");
 		goto fail_dput;
 	}
-	if (!pipe->f_op || !pipe->f_op->write)
+	if (autofs_prepare_pipe(pipe) < 0)
 		goto fail_fput;
 	sbi->pipe = pipe;
 	sbi->pipefd = pipefd;
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index 9c098db43344..5ce5026200b9 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -97,16 +97,14 @@ static int autofs4_write(struct autofs_sb_info *sbi,
  *
  * The packets are identical on x86-32 and x86-64, but have different
  * alignment. Which means that 'sizeof()' will give different results.
- * Fix it up for the case of running 32-bit user mode on a 64-bit kernel.
+ * However, we packetize the pipe, so just use the bigger size, the
+ * pipe read will discard the rest.
+ *
+ * This adds on 8 bytes of garbage FOR DEBUGGING ONLY!
  */
-static noinline size_t autofs_v5_packet_size(struct autofs_sb_info *sbi)
+static inline size_t autofs_v5_packet_size(struct autofs_sb_info *sbi)
 {
-	size_t pktsz = sizeof(struct autofs_v5_packet);
-#if defined(CONFIG_X86_64) && defined(CONFIG_COMPAT)
-	if (sbi->compat_daemon > 0)
-		pktsz -= 4;
-#endif
-	return pktsz;
+	return sizeof(struct autofs_v5_packet)+8;
 }
 
 static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
diff --git a/fs/pipe.c b/fs/pipe.c
index 25feaa3faac0..c65051eb6893 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -346,6 +346,11 @@ static const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.get = generic_pipe_buf_get,
 };
 
+static inline int is_packetized(struct file *file)
+{
+	return (file->f_flags & O_DIRECT) != 0;
+}
+
 static ssize_t
 pipe_read(struct kiocb *iocb, const struct iovec *_iov,
 	   unsigned long nr_segs, loff_t pos)
@@ -407,7 +412,7 @@ redo:
 			ret += chars;
 			buf->offset += chars;
 			buf->len -= chars;
-			if (!buf->len) {
+			if (!buf->len || is_packetized(filp)) {
 				buf->ops = NULL;
 				ops->release(pipe, buf);
 				curbuf = (curbuf + 1) & (pipe->buffers - 1);
@@ -416,7 +421,7 @@ redo:
 				do_wakeup = 1;
 			}
 			total_len -= chars;
-			if (!total_len)
+			if (!total_len || is_packetized(filp))
 				break;	/* common path: read succeeded */
 		}
 		if (bufs)	/* More to do? */
@@ -490,7 +495,7 @@ pipe_write(struct kiocb *iocb, const struct iovec *_iov,
 
 	/* We try to merge small writes */
 	chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */
-	if (pipe->nrbufs && chars != 0) {
+	if (!is_packetized(filp) && pipe->nrbufs && chars != 0) {
 		int lastbuf = (pipe->curbuf + pipe->nrbufs - 1) &
 							(pipe->buffers - 1);
 		struct pipe_buffer *buf = pipe->bufs + lastbuf;
@@ -1013,7 +1018,7 @@ struct file *create_write_pipe(int flags)
 		goto err_dentry;
 	f->f_mapping = inode->i_mapping;
 
-	f->f_flags = O_WRONLY | (flags & O_NONBLOCK);
+	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->f_version = 0;
 
 	return f;
@@ -1046,7 +1051,7 @@ struct file *create_read_pipe(struct file *wrf, int flags)
 		return ERR_PTR(-ENFILE);
 
 	path_get(&wrf->f_path);
-	f->f_flags = O_RDONLY | (flags & O_NONBLOCK);
+	f->f_flags = O_RDONLY | (flags & (O_NONBLOCK | O_DIRECT));
 
 	return f;
 }
@@ -1057,7 +1062,7 @@ int do_pipe_flags(int *fd, int flags)
 	int error;
 	int fdw, fdr;
 
-	if (flags & ~(O_CLOEXEC | O_NONBLOCK))
+	if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT))
 		return -EINVAL;
 
 	fw = create_write_pipe(flags);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 23:07                             ` Linus Torvalds
@ 2012-04-28  0:03                               ` H. Peter Anvin
  2012-04-28  0:17                                 ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: H. Peter Anvin @ 2012-04-28  0:03 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On 04/27/2012 04:07 PM, Linus Torvalds wrote:
> 
> The important change is actually to make the read return the size requested.
> 
> So broken user space does a read() with the wrong size - and then
> checks that it gets *exactly* that many bytes. Not more, not less.
> 
> The way to handle that is to
>  - make sure the kernel always writes the maximally padded data
>  - make the packetization simply drop any data that was in the packet
> that the reader didn't ask for.
> 
> This is very much a semantic change, in that any client that tries to
> read the packet with multiple reads (one 4-byte read to see the size,
> followed by one "right-sized" read of the data) would be totally
> screwed. The first read would indeed read the size, but it also -
> because of the packetized interface - would simply drop the data, and
> the next read would read the first bytes of the next packet.
> 
> But that's not what the autofs users actually do anyway. They just
> read the whole packet.  So we can make *them* work. And the new
> interface will be fairly robust (in fact, you could pass it some big
> buffer and just know you always get exactly one packet, and never have
> that whole stupid "sizeof()" at all).
> 

OK, I follow you now.  That would work for autofs; I presume it is not
something we would export to other users though?  If so I'd worry about
opening up new security issues.

Still, I have to admit... we have a grand total of three users of this
interface as far as we know (autofs, systemd, and am-utils if they ever
revved that one to v5.)  Would it really not be better to do the
zero-eating user space fix?

	-hpa



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-28  0:03                               ` H. Peter Anvin
@ 2012-04-28  0:17                                 ` Linus Torvalds
  0 siblings, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-28  0:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Miller, mjt, linux-kernel, autofs, raven, thomas, stable

On Fri, Apr 27, 2012 at 5:03 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Still, I have to admit... we have a grand total of three users of this
> interface as far as we know (autofs, systemd, and am-utils if they ever
> revved that one to v5.)  Would it really not be better to do the
> zero-eating user space fix?

Considering the sh*t that is in user space now, and how distributions
tend to take years (if ever) to actually update some programs, I don't
see how that would actually work in practice.

                   Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 19:14             ` David Miller
  2012-04-27 19:16               ` David Miller
@ 2012-04-28  1:56               ` Ian Kent
  1 sibling, 0 replies; 44+ messages in thread
From: Ian Kent @ 2012-04-28  1:56 UTC (permalink / raw)
  To: David Miller; +Cc: torvalds, mjt, linux-kernel, autofs, thomas, stable

On Fri, 2012-04-27 at 15:14 -0400, David Miller wrote:
> From: Linus Torvalds <torvalds@linux-foundation.org>
> Date: Fri, 27 Apr 2012 11:55:12 -0700
> 
> > On Fri, Apr 27, 2012 at 11:42 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> >>
> >> There's no question that systemd is broken.
> > 
> > Actually, I'll take that back.
> > 
> > Yes, systemd has breakage. But it's actually automount that is the
> > truly broken piece of sh*t.
> > 
> > I think that 'automount' is even more broken. The fact that the
> > automount maintainers knew about this, and added TOTALLY BROKEN code
> > to their automount source tree, over five years ago, because the
> > authors clearly did not understand what the f*ck they were doing,
> > that's the real problem.
> 
> I respectfully disagree.
> 
> It's ugly as shit, but it is the only one place where one can be
> absolutely sure that we are dealing with a pipe passing those v5
> things around.
> 
> All these hacks we have been talking about, assuming the mount means
> the pipe is for passing structure so-and-so around, and now trying
> to find some other check such as one on current->comm...
> 
> That's better?
> 
> Only the application really knows.  And I bet the person who wrote
> that automountd code you find so distasteful analyzed this and came
> to realize how difficult the kernel side would be to get right.

There's nothing I can say in my defense, Linus has made that very clear,
over and again.

When I realized the alignment mistake there were already shipping
binaries so I decided it was too late to change that.

And I see I got the analysis of the architectures alignment wrong, mmmm,
that's bad as well.

I guess I'll just be quiet now since anything I say is probably going to
be wrong.

Ian



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 23:27                         ` Linus Torvalds
@ 2012-04-28 16:10                           ` Linus Torvalds
  2012-04-29  6:37                             ` Michael Tokarev
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-28 16:10 UTC (permalink / raw)
  To: Alan Cox, Thomas Meyer
  Cc: David Miller, mjt, linux-kernel, autofs, raven, stable

[-- Attachment #1: Type: text/plain, Size: 658 bytes --]

On Fri, Apr 27, 2012 at 4:27 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The attached patch implements that, and makes autofs automatically set
> the O_DIRECT flag.

Ok, since the packetized pipe approach makes the "compat" games in
autofs unnecessary (autofs can just always do the bigger write), I
have now reverted the commit that started this issue in my tree (and
marked it for stable). So automount should work again.

However, I would still like to know if systemd works with the attached
patch (it's pretty much the same patch as before, but the revert made
for some changes in the autofs code).

Thomas?

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 4286 bytes --]

 fs/autofs4/autofs_i.h  |   11 +++++++++++
 fs/autofs4/dev-ioctl.c |    2 +-
 fs/autofs4/inode.c     |    2 +-
 fs/autofs4/waitq.c     |    2 +-
 fs/pipe.c              |   17 +++++++++++------
 5 files changed, 25 insertions(+), 9 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index d8d8e7ba6a1e..5817343bad32 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -269,6 +269,17 @@ int autofs4_fill_super(struct super_block *, void *, int);
 struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
 void autofs4_clean_ino(struct autofs_info *);
 
+static inline int autofs_prepare_pipe(struct file *pipe)
+{
+	if (!pipe->f_op || !pipe->f_op->write)
+		return -EINVAL;
+	if (!pipe->f_dentry->d_inode->i_pipe)
+		return -EINVAL;
+	/* We want a packet pipe */
+	pipe->f_flags |= O_DIRECT;
+	return 0;
+}
+
 /* Queue management functions */
 
 int autofs4_wait(struct autofs_sb_info *,struct dentry *, enum autofs_notify);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 3dfd615afb6b..aa9103f8f01b 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -376,7 +376,7 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
 			err = -EBADF;
 			goto out;
 		}
-		if (!pipe->f_op || !pipe->f_op->write) {
+		if (autofs_prepare_pipe(pipe) < 0) {
 			err = -EPIPE;
 			fput(pipe);
 			goto out;
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index 14c7bc02349e..6e488ebe7784 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -290,7 +290,7 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent)
 		printk("autofs: could not open pipe file descriptor\n");
 		goto fail_dput;
 	}
-	if (!pipe->f_op || !pipe->f_op->write)
+	if (autofs_prepare_pipe(pipe) < 0)
 		goto fail_fput;
 	sbi->pipe = pipe;
 	sbi->pipefd = pipefd;
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index da8876d38a7b..f624cd0ff84d 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -91,7 +91,7 @@ static int autofs4_write(struct autofs_sb_info *sbi,
 
 	return (bytes > 0);
 }
-	
+
 static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 				 struct autofs_wait_queue *wq,
 				 int type)
diff --git a/fs/pipe.c b/fs/pipe.c
index 25feaa3faac0..c65051eb6893 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -346,6 +346,11 @@ static const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.get = generic_pipe_buf_get,
 };
 
+static inline int is_packetized(struct file *file)
+{
+	return (file->f_flags & O_DIRECT) != 0;
+}
+
 static ssize_t
 pipe_read(struct kiocb *iocb, const struct iovec *_iov,
 	   unsigned long nr_segs, loff_t pos)
@@ -407,7 +412,7 @@ redo:
 			ret += chars;
 			buf->offset += chars;
 			buf->len -= chars;
-			if (!buf->len) {
+			if (!buf->len || is_packetized(filp)) {
 				buf->ops = NULL;
 				ops->release(pipe, buf);
 				curbuf = (curbuf + 1) & (pipe->buffers - 1);
@@ -416,7 +421,7 @@ redo:
 				do_wakeup = 1;
 			}
 			total_len -= chars;
-			if (!total_len)
+			if (!total_len || is_packetized(filp))
 				break;	/* common path: read succeeded */
 		}
 		if (bufs)	/* More to do? */
@@ -490,7 +495,7 @@ pipe_write(struct kiocb *iocb, const struct iovec *_iov,
 
 	/* We try to merge small writes */
 	chars = total_len & (PAGE_SIZE-1); /* size of the last buffer */
-	if (pipe->nrbufs && chars != 0) {
+	if (!is_packetized(filp) && pipe->nrbufs && chars != 0) {
 		int lastbuf = (pipe->curbuf + pipe->nrbufs - 1) &
 							(pipe->buffers - 1);
 		struct pipe_buffer *buf = pipe->bufs + lastbuf;
@@ -1013,7 +1018,7 @@ struct file *create_write_pipe(int flags)
 		goto err_dentry;
 	f->f_mapping = inode->i_mapping;
 
-	f->f_flags = O_WRONLY | (flags & O_NONBLOCK);
+	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->f_version = 0;
 
 	return f;
@@ -1046,7 +1051,7 @@ struct file *create_read_pipe(struct file *wrf, int flags)
 		return ERR_PTR(-ENFILE);
 
 	path_get(&wrf->f_path);
-	f->f_flags = O_RDONLY | (flags & O_NONBLOCK);
+	f->f_flags = O_RDONLY | (flags & (O_NONBLOCK | O_DIRECT));
 
 	return f;
 }
@@ -1057,7 +1062,7 @@ int do_pipe_flags(int *fd, int flags)
 	int error;
 	int fdw, fdr;
 
-	if (flags & ~(O_CLOEXEC | O_NONBLOCK))
+	if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT))
 		return -EINVAL;
 
 	fw = create_write_pipe(flags);

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-27 20:37       ` H. Peter Anvin
@ 2012-04-28 22:20         ` Mark Lord
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Lord @ 2012-04-28 22:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Michael Tokarev, Linus Torvalds, Linux-kernel, autofs, Ian Kent,
	Thomas Meyer, stable

On 12-04-27 04:37 PM, H. Peter Anvin wrote:
> On 04/27/2012 08:47 AM, Mark Lord wrote:
>> On 12-04-27 05:45 AM, Michael Tokarev wrote:
>> ..
>>> Please note: the talk is about 32bit userspace on 64bit kernel,
>>> which should be quite rare these days, or something of less
>>> priority.
>>
>>
>> I suspect 32bit-PAE is more common than 32/64,
>> but both are valid and not uncommon in the wild.
> 
> 32-on-64 is a very important use case, because it lets us get people off
> 32-bit kernels.

Agreed.  But -PAE is even more important for the millions of people
with NVIDIA graphics controllers in their machines, or at least if
they want decent video playback.  :)

The otherwise-excellent NVIDIA drivers don't work in 32/64 mode,
but they do handle -PAE just fine.

Cheers

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-28 16:10                           ` Linus Torvalds
@ 2012-04-29  6:37                             ` Michael Tokarev
  2012-04-29  7:19                               ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: Michael Tokarev @ 2012-04-29  6:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

On 28.04.2012 20:10, Linus Torvalds wrote:
> On Fri, Apr 27, 2012 at 4:27 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> The attached patch implements that, and makes autofs automatically set
>> the O_DIRECT flag.
> 
> Ok, since the packetized pipe approach makes the "compat" games in
> autofs unnecessary (autofs can just always do the bigger write), I
> have now reverted the commit that started this issue in my tree (and
> marked it for stable). So automount should work again.
> 
> However, I would still like to know if systemd works with the attached
> patch (it's pretty much the same patch as before, but the revert made
> for some changes in the autofs code).
> 
> Thomas?

I installed 32bit userspace together with systemd in a vrtual machine
today, and tried it with this patch (and with the previous autofs
'fix' reverted).

And no, it does not quite work.  Strace shows that systemd correctly
reads first 300 bytes, but next read returns 4 bytes, so it reads
these and waits for next 300-4 = 296 bytes of data.

/proc/1/fdinfo/16  -- the autofs pipe fd# -- shows flags=02004000
which is O_CLOEXEC|O_NONBLOCK, but this is the other end of the
pipe..  shouldn't the READ side of the pipe have O_DIRECT flag now?

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29  6:37                             ` Michael Tokarev
@ 2012-04-29  7:19                               ` Linus Torvalds
  2012-04-29  7:45                                 ` Michael Tokarev
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-29  7:19 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

On Sat, Apr 28, 2012 at 11:37 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> And no, it does not quite work.  Strace shows that systemd correctly
> reads first 300 bytes, but next read returns 4 bytes, so it reads
> these and waits for next 300-4 = 296 bytes of data.
>
> /proc/1/fdinfo/16  -- the autofs pipe fd# -- shows flags=02004000
> which is O_CLOEXEC|O_NONBLOCK, but this is the other end of the
> pipe..  shouldn't the READ side of the pipe have O_DIRECT flag now?

Gaah, it should, but it won't.

I bet my original patch worked fine, because the pipe has only one
inode and pipe structure. But it has *two* 'struct file's associated
with it, and autofs only ever sees the writing side, and never gets to
mark the reading side O_DIRECT. So yeah, the reading side won't do the
proper packetized read.

Duh. So we need to put the flag back in the pipe_inode_info. Too bad -
because the O_DIRECT approach was so nice in other ways.

                     Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29  7:19                               ` Linus Torvalds
@ 2012-04-29  7:45                                 ` Michael Tokarev
  2012-04-29 18:29                                   ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: Michael Tokarev @ 2012-04-29  7:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

[Hit "reply" instead of "Reply to all".. I'm sorry for a repost]

On 29.04.2012 11:19, Linus Torvalds wrote:
[]
>> /proc/1/fdinfo/16  -- the autofs pipe fd# -- shows flags=02004000
>> which is O_CLOEXEC|O_NONBLOCK, but this is the other end of the
>> pipe..  shouldn't the READ side of the pipe have O_DIRECT flag now?
>
> Gaah, it should, but it won't.
>
> I bet my original patch worked fine, because the pipe has only one
> inode and pipe structure. But it has *two* 'struct file's associated
> with it, and autofs only ever sees the writing side, and never gets to
> mark the reading side O_DIRECT. So yeah, the reading side won't do the
> proper packetized read.

Can't we go - in kernel - from one struct file to pipe structure to
another file structure and set O_DIRECT there?  Autofs kernel code
checks if the file descriptor is a pipe, so it should be possible...

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29  7:45                                 ` Michael Tokarev
@ 2012-04-29 18:29                                   ` Linus Torvalds
  2012-04-29 19:09                                     ` Linus Torvalds
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-29 18:29 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

On Sun, Apr 29, 2012 at 12:45 AM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> On 29.04.2012 11:19, Linus Torvalds wrote:
>>
>> Gaah, it should, but it won't.
>>
>> I bet my original patch worked fine, because the pipe has only one
>> inode and pipe structure. But it has *two* 'struct file's associated
>> with it, and autofs only ever sees the writing side, and never gets to
>> mark the reading side O_DIRECT. So yeah, the reading side won't do the
>> proper packetized read.
>
> Can't we go - in kernel - from one struct file to pipe structure to
> another file structure and set O_DIRECT there?  Autofs kernel code
> checks if the file descriptor is a pipe, so it should be possible...

Nope. We don't have any back-pointers to the other "struct file". They
have the inode - and the "struct pipe_inode_info" in common, but
there's no way to find the other "struct file" from that.

And sure, a pipe in Linux could work with a single read-write "struct
file" that is used both for the reading and writing, but that's not
how the "pipe()" system call is defined - it gets two separate file
descriptors, one read-only, one write-only. So that's what autofs and
systemd hands the kernel - the write side of the pipe. And we can't
find the read-side one.

But I think I have another approach. We can make the *writer* be the
only one that cares about the packetized nature, and if it's a packet
write, it will set a PIPE_BUF_PACKET flag in the pipe buffer. The
reader can react to that. I think that actually has the potential to
make the code a bit prettier too..

I'll cook it up.

                     Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29 18:29                                   ` Linus Torvalds
@ 2012-04-29 19:09                                     ` Linus Torvalds
  2012-04-29 19:53                                       ` Michael Tokarev
  0 siblings, 1 reply; 44+ messages in thread
From: Linus Torvalds @ 2012-04-29 19:09 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Sun, Apr 29, 2012 at 11:29 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But I think I have another approach. We can make the *writer* be the
> only one that cares about the packetized nature, and if it's a packet
> write, it will set a PIPE_BUF_PACKET flag in the pipe buffer. The
> reader can react to that. I think that actually has the potential to
> make the code a bit prettier too..

Patch attached.

This keeps Alan's idea of using O_DIRECT, but only makes it matter for
the writer - because now the "packet" thing is a per-pipe-buffer state
(we already had per-pipe-buffer flags, so this is not anything new).
So when you *write* using a O_DIRECT pipe, it will create packetized
buffers, and the reader just sees that directly.

So this should "just work", and doesn't need any extra flags in the
pipe_inode_info. And the user space interface remains the same: you
can create these packet pipes with "pipe2(fd, O_DIRECT)" if you want
to, and my test-program gives exactly the same output.

                      Linus

[-- Attachment #2: patch.diff --]
[-- Type: application/octet-stream, Size: 4757 bytes --]

 fs/autofs4/autofs_i.h     |   11 +++++++++++
 fs/autofs4/dev-ioctl.c    |    2 +-
 fs/autofs4/inode.c        |    2 +-
 fs/autofs4/waitq.c        |    2 +-
 fs/pipe.c                 |   31 +++++++++++++++++++++++++++++--
 include/linux/pipe_fs_i.h |    1 +
 6 files changed, 44 insertions(+), 5 deletions(-)

diff --git a/fs/autofs4/autofs_i.h b/fs/autofs4/autofs_i.h
index d8d8e7ba6a1e..908e18455413 100644
--- a/fs/autofs4/autofs_i.h
+++ b/fs/autofs4/autofs_i.h
@@ -269,6 +269,17 @@ int autofs4_fill_super(struct super_block *, void *, int);
 struct autofs_info *autofs4_new_ino(struct autofs_sb_info *);
 void autofs4_clean_ino(struct autofs_info *);
 
+static inline int autofs_prepare_pipe(struct file *pipe)
+{
+	if (!pipe->f_op || !pipe->f_op->write)
+		return -EINVAL;
+	if (!S_ISFIFO(pipe->f_dentry->d_inode->i_mode))
+		return -EINVAL;
+	/* We want a packet pipe */
+	pipe->f_flags |= O_DIRECT;
+	return 0;
+}
+
 /* Queue management functions */
 
 int autofs4_wait(struct autofs_sb_info *,struct dentry *, enum autofs_notify);
diff --git a/fs/autofs4/dev-ioctl.c b/fs/autofs4/dev-ioctl.c
index 3dfd615afb6b..aa9103f8f01b 100644
--- a/fs/autofs4/dev-ioctl.c
+++ b/fs/autofs4/dev-ioctl.c
@@ -376,7 +376,7 @@ static int autofs_dev_ioctl_setpipefd(struct file *fp,
 			err = -EBADF;
 			goto out;
 		}
-		if (!pipe->f_op || !pipe->f_op->write) {
+		if (autofs_prepare_pipe(pipe) < 0) {
 			err = -EPIPE;
 			fput(pipe);
 			goto out;
diff --git a/fs/autofs4/inode.c b/fs/autofs4/inode.c
index 14c7bc02349e..6e488ebe7784 100644
--- a/fs/autofs4/inode.c
+++ b/fs/autofs4/inode.c
@@ -290,7 +290,7 @@ int autofs4_fill_super(struct super_block *s, void *data, int silent)
 		printk("autofs: could not open pipe file descriptor\n");
 		goto fail_dput;
 	}
-	if (!pipe->f_op || !pipe->f_op->write)
+	if (autofs_prepare_pipe(pipe) < 0)
 		goto fail_fput;
 	sbi->pipe = pipe;
 	sbi->pipefd = pipefd;
diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c
index da8876d38a7b..f624cd0ff84d 100644
--- a/fs/autofs4/waitq.c
+++ b/fs/autofs4/waitq.c
@@ -91,7 +91,7 @@ static int autofs4_write(struct autofs_sb_info *sbi,
 
 	return (bytes > 0);
 }
-	
+
 static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
 				 struct autofs_wait_queue *wq,
 				 int type)
diff --git a/fs/pipe.c b/fs/pipe.c
index 25feaa3faac0..fec5e4ad071a 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -346,6 +346,16 @@ static const struct pipe_buf_operations anon_pipe_buf_ops = {
 	.get = generic_pipe_buf_get,
 };
 
+static const struct pipe_buf_operations packet_pipe_buf_ops = {
+	.can_merge = 0,
+	.map = generic_pipe_buf_map,
+	.unmap = generic_pipe_buf_unmap,
+	.confirm = generic_pipe_buf_confirm,
+	.release = anon_pipe_buf_release,
+	.steal = generic_pipe_buf_steal,
+	.get = generic_pipe_buf_get,
+};
+
 static ssize_t
 pipe_read(struct kiocb *iocb, const struct iovec *_iov,
 	   unsigned long nr_segs, loff_t pos)
@@ -407,6 +417,13 @@ redo:
 			ret += chars;
 			buf->offset += chars;
 			buf->len -= chars;
+
+			/* Was it a packet buffer? Clean up and exit */
+			if (buf->flags & PIPE_BUF_FLAG_PACKET) {
+				total_len = chars;
+				buf->len = 0;
+			}
+
 			if (!buf->len) {
 				buf->ops = NULL;
 				ops->release(pipe, buf);
@@ -459,6 +476,11 @@ redo:
 	return ret;
 }
 
+static inline int is_packetized(struct file *file)
+{
+	return (file->f_flags & O_DIRECT) != 0;
+}
+
 static ssize_t
 pipe_write(struct kiocb *iocb, const struct iovec *_iov,
 	    unsigned long nr_segs, loff_t ppos)
@@ -593,6 +615,11 @@ redo2:
 			buf->ops = &anon_pipe_buf_ops;
 			buf->offset = 0;
 			buf->len = chars;
+			buf->flags = 0;
+			if (is_packetized(filp)) {
+				buf->ops = &packet_pipe_buf_ops;
+				buf->flags = PIPE_BUF_FLAG_PACKET;
+			}
 			pipe->nrbufs = ++bufs;
 			pipe->tmp_page = NULL;
 
@@ -1013,7 +1040,7 @@ struct file *create_write_pipe(int flags)
 		goto err_dentry;
 	f->f_mapping = inode->i_mapping;
 
-	f->f_flags = O_WRONLY | (flags & O_NONBLOCK);
+	f->f_flags = O_WRONLY | (flags & (O_NONBLOCK | O_DIRECT));
 	f->f_version = 0;
 
 	return f;
@@ -1057,7 +1084,7 @@ int do_pipe_flags(int *fd, int flags)
 	int error;
 	int fdw, fdr;
 
-	if (flags & ~(O_CLOEXEC | O_NONBLOCK))
+	if (flags & ~(O_CLOEXEC | O_NONBLOCK | O_DIRECT))
 		return -EINVAL;
 
 	fw = create_write_pipe(flags);
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h
index 6d626ff0cfd0..e1ac1ce16fb0 100644
--- a/include/linux/pipe_fs_i.h
+++ b/include/linux/pipe_fs_i.h
@@ -6,6 +6,7 @@
 #define PIPE_BUF_FLAG_LRU	0x01	/* page is on the LRU */
 #define PIPE_BUF_FLAG_ATOMIC	0x02	/* was atomically mapped */
 #define PIPE_BUF_FLAG_GIFT	0x04	/* page is a gift */
+#define PIPE_BUF_FLAG_PACKET	0x08	/* read() as a packet */
 
 /**
  *	struct pipe_buffer - a linux kernel pipe buffer

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29 19:09                                     ` Linus Torvalds
@ 2012-04-29 19:53                                       ` Michael Tokarev
  2012-04-29 20:53                                         ` Linus Torvalds
  2012-04-30  8:41                                         ` Thomas Meyer
  0 siblings, 2 replies; 44+ messages in thread
From: Michael Tokarev @ 2012-04-29 19:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

On 29.04.2012 23:09, Linus Torvalds wrote:
> 
> This keeps Alan's idea of using O_DIRECT, but only makes it matter for
> the writer - because now the "packet" thing is a per-pipe-buffer state
> (we already had per-pipe-buffer flags, so this is not anything new).
> So when you *write* using a O_DIRECT pipe, it will create packetized
> buffers, and the reader just sees that directly.
> 
> So this should "just work", and doesn't need any extra flags in the
> pipe_inode_info. And the user space interface remains the same: you
> can create these packet pipes with "pipe2(fd, O_DIRECT)" if you want
> to, and my test-program gives exactly the same output.

Ok.  I verified this on 3.3 kernel (with the original fix reverted),
and now both autofs5 and systemd works.

I also verified it on 3.0 kernel (3.0.30), the patch also applies
there just fine (with 1..2 lines offsets) and works too -- at least
on 32/64 bits.

So, the result appears to be an excellent solution to a bad problem...

Thank you all for the efforts and support!

You can add my

Tested-off-by: Michael Tokarev <mjt@tls.msk.ru>

(I'd add a Signed-off-by, but it is not my patch ;)

/mjt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29 19:53                                       ` Michael Tokarev
@ 2012-04-29 20:53                                         ` Linus Torvalds
  2012-04-30  8:41                                         ` Thomas Meyer
  1 sibling, 0 replies; 44+ messages in thread
From: Linus Torvalds @ 2012-04-29 20:53 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Alan Cox, Thomas Meyer, David Miller, linux-kernel, autofs,
	raven, stable

On Sun, Apr 29, 2012 at 12:53 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
>
> Ok.  I verified this on 3.3 kernel (with the original fix reverted),
> and now both autofs5 and systemd works.
>
> I also verified it on 3.0 kernel (3.0.30), the patch also applies
> there just fine (with 1..2 lines offsets) and works too -- at least
> on 32/64 bits.
>
> So, the result appears to be an excellent solution to a bad problem...
>
> Thank you all for the efforts and support!

And thank *you* for reporting and testing and staying with this.

I've committed the patch (as two separate patches: the first does the
pipe infrastructure, the second adds the code in autofs to enable the
use of it), and tried to comment on it extensively in the changelogs.

I also marked the patches for stable, but I would suggest that the
stable team wait a while before applying it, just to verify that we
don't have some crazy third autofs pipe user that does the packet as
many smaller reads. I don't think that's a sane thing to do, and I
don't think it happens, but by now I'm getting a big paranoid about
this whole thing.

Because I really hope this means that we can close this issue for good.

                      Linus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v2] Introduce a version6 of autofs interface, to fix design error.
  2012-04-29 19:53                                       ` Michael Tokarev
  2012-04-29 20:53                                         ` Linus Torvalds
@ 2012-04-30  8:41                                         ` Thomas Meyer
  1 sibling, 0 replies; 44+ messages in thread
From: Thomas Meyer @ 2012-04-30  8:41 UTC (permalink / raw)
  To: Michael Tokarev
  Cc: Linus Torvalds, Alan Cox, David Miller, linux-kernel, autofs,
	raven, stable

Am Sonntag, den 29.04.2012, 23:53 +0400 schrieb Michael Tokarev:
> On 29.04.2012 23:09, Linus Torvalds wrote:
> > 
> > This keeps Alan's idea of using O_DIRECT, but only makes it matter for
> > the writer - because now the "packet" thing is a per-pipe-buffer state
> > (we already had per-pipe-buffer flags, so this is not anything new).
> > So when you *write* using a O_DIRECT pipe, it will create packetized
> > buffers, and the reader just sees that directly.
> > 
> > So this should "just work", and doesn't need any extra flags in the
> > pipe_inode_info. And the user space interface remains the same: you
> > can create these packet pipes with "pipe2(fd, O_DIRECT)" if you want
> > to, and my test-program gives exactly the same output.
> 
> Ok.  I verified this on 3.3 kernel (with the original fix reverted),
> and now both autofs5 and systemd works.

hi,

I can also confirm that "autofs: make the autofsv5 packet file
descriptor use a packetized pipe" and "pipes: add a "packetized pipe"
mode for writing" on top of 3.3.4 works correctly at least with the
autofs feature of systemd.

And sorry for the delayed answer, but I've been away from the internet
for a few days.

and a big sorry for all the trouble...

with kind regards
thomas


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2012-04-30  8:41 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-26 13:34 [PATCH v2] Introduce a version6 of autofs interface, to fix design error Michael Tokarev
2012-04-26 13:44 ` Michael Tokarev
2012-04-27  0:37 ` Linus Torvalds
2012-04-27  9:45   ` Michael Tokarev
2012-04-27 15:47     ` Mark Lord
2012-04-27 20:37       ` H. Peter Anvin
2012-04-28 22:20         ` Mark Lord
2012-04-27 16:22     ` David Miller
2012-04-27 17:10       ` Michael Tokarev
2012-04-27 17:28         ` David Miller
2012-04-27 18:19     ` Linus Torvalds
2012-04-27 18:34       ` David Miller
2012-04-27 18:42         ` Linus Torvalds
2012-04-27 18:55           ` Linus Torvalds
2012-04-27 19:14             ` David Miller
2012-04-27 19:16               ` David Miller
2012-04-27 19:19                 ` Linus Torvalds
2012-04-27 19:24                   ` David Miller
2012-04-27 19:56                     ` Linus Torvalds
2012-04-27 20:13                       ` Stef Bon
2012-04-27 20:29                       ` David Miller
2012-04-27 22:40                         ` Linus Torvalds
2012-04-27 20:43                       ` H. Peter Anvin
2012-04-27 22:42                         ` Linus Torvalds
2012-04-27 22:56                           ` H. Peter Anvin
2012-04-27 23:07                             ` Linus Torvalds
2012-04-28  0:03                               ` H. Peter Anvin
2012-04-28  0:17                                 ` Linus Torvalds
2012-04-27 22:42                       ` Alan Cox
2012-04-27 22:49                         ` Linus Torvalds
2012-04-27 23:27                         ` Linus Torvalds
2012-04-28 16:10                           ` Linus Torvalds
2012-04-29  6:37                             ` Michael Tokarev
2012-04-29  7:19                               ` Linus Torvalds
2012-04-29  7:45                                 ` Michael Tokarev
2012-04-29 18:29                                   ` Linus Torvalds
2012-04-29 19:09                                     ` Linus Torvalds
2012-04-29 19:53                                       ` Michael Tokarev
2012-04-29 20:53                                         ` Linus Torvalds
2012-04-30  8:41                                         ` Thomas Meyer
2012-04-28  1:56               ` Ian Kent
2012-04-27 19:08           ` David Miller
2012-04-27 20:45             ` H. Peter Anvin
2012-04-27 20:42       ` H. Peter Anvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.