All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink
@ 2019-08-01 20:03 Jason Wessel
  2019-08-01 21:37 ` [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK] Jason Wessel
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wessel @ 2019-08-01 20:03 UTC (permalink / raw)
  To: openembedded-core

While working with ostree disk generation in conjunction with wic, I
found a problem with pseudo where it tried to resolve a symlink when
it shouldn't, based on openat() flags.  I narrowed down the problem to
a simple c program to reproduce the issue:

int main()
{
    /* Tested with: gcc -Wall -o app app.c ; echo "no pseudo" ; ./app ; echo "pseudo"; pseudo ./app */
    system("rm -rf tdir tlink");
    system("mkdir tdir");
    system("ln -s tdir tlink");
    DIR *dir = opendir(".");
    int dfd = dirfd(dir);

    int target_dfd = openat (dfd, "tlink", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW);
    if (target_dfd == -1) {
        printf("This is right\n");
    } else {
        printf("This is broken\n");
    }
    return 0;
}

Many thanks to Peter Seebach for fixing the problem in the pseudo code
to use the same logic which was already there for the
AT_SYMLINK_NOFOLLOW.

Also updated is the license MD5 checksum since the master branch of
pseudo has had teh SPDX data updated.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
---
 meta/recipes-devtools/pseudo/pseudo.inc    | 2 +-
 meta/recipes-devtools/pseudo/pseudo_git.bb | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/meta/recipes-devtools/pseudo/pseudo.inc b/meta/recipes-devtools/pseudo/pseudo.inc
index 8b05735bb1..8b34909726 100644
--- a/meta/recipes-devtools/pseudo/pseudo.inc
+++ b/meta/recipes-devtools/pseudo/pseudo.inc
@@ -4,7 +4,7 @@
 
 SUMMARY = "Pseudo gives fake root capabilities to a normal user"
 HOMEPAGE = "http://git.yoctoproject.org/cgit/cgit.cgi/pseudo"
-LIC_FILES_CHKSUM = "file://COPYING;md5=243b725d71bb5df4a1e5920b344b86ad"
+LIC_FILES_CHKSUM = "file://COPYING;md5=a1d8023a6f953ac6ea4af765ff62d574"
 SECTION = "base"
 LICENSE = "LGPL2.1"
 DEPENDS = "sqlite3 attr"
diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb
index 51db84c4d4..3350c3fabd 100644
--- a/meta/recipes-devtools/pseudo/pseudo_git.bb
+++ b/meta/recipes-devtools/pseudo/pseudo_git.bb
@@ -8,7 +8,7 @@ SRC_URI = "git://git.yoctoproject.org/pseudo \
            file://toomanyfiles.patch \
            "
 
-SRCREV = "3fa7c853e0bcd6fe23f7524c2a3c9e3af90901c3"
+SRCREV = "097ca3e245200c4a4333964af59a106c42ff3bca"
 S = "${WORKDIR}/git"
 PV = "1.9.0+git${SRCPV}"
 
-- 
2.21.0



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-01 20:03 [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink Jason Wessel
@ 2019-08-01 21:37 ` Jason Wessel
  2019-08-01 23:57   ` Seebs
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wessel @ 2019-08-01 21:37 UTC (permalink / raw)
  To: openembedded-core, seebs

While this is a real problem.  We need to put this patch on hold.

It seems to have caused really odd problems with the oe link management that were not there previously, such as:


WARNING: pinentry-1.1.0-r0 do_package_qa: QA Issue: pinentry: /usr/bin/pinentry is owned by uid 5002, which is the same as the user running bitbake. This may be due to host contamination [host-user-contaminated]

I'll continue to look into the problem.

Cheers,
Jason.

On 8/1/19 3:03 PM, Jason Wessel wrote:
> While working with ostree disk generation in conjunction with wic, I
> found a problem with pseudo where it tried to resolve a symlink when
> it shouldn't, based on openat() flags.  I narrowed down the problem to
> a simple c program to reproduce the issue:
>
> int main()
> {
>      /* Tested with: gcc -Wall -o app app.c ; echo "no pseudo" ; ./app ; echo "pseudo"; pseudo ./app */
>      system("rm -rf tdir tlink");
>      system("mkdir tdir");
>      system("ln -s tdir tlink");
>      DIR *dir = opendir(".");
>      int dfd = dirfd(dir);
>
>      int target_dfd = openat (dfd, "tlink", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW);
>      if (target_dfd == -1) {
>          printf("This is right\n");
>      } else {
>          printf("This is broken\n");
>      }
>      return 0;
> }
>
> Many thanks to Peter Seebach for fixing the problem in the pseudo code
> to use the same logic which was already there for the
> AT_SYMLINK_NOFOLLOW.
>
> Also updated is the license MD5 checksum since the master branch of
> pseudo has had teh SPDX data updated.
>
> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
> ---
>   meta/recipes-devtools/pseudo/pseudo.inc    | 2 +-
>   meta/recipes-devtools/pseudo/pseudo_git.bb | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/meta/recipes-devtools/pseudo/pseudo.inc b/meta/recipes-devtools/pseudo/pseudo.inc
> index 8b05735bb1..8b34909726 100644
> --- a/meta/recipes-devtools/pseudo/pseudo.inc
> +++ b/meta/recipes-devtools/pseudo/pseudo.inc
> @@ -4,7 +4,7 @@
>   
>   SUMMARY = "Pseudo gives fake root capabilities to a normal user"
>   HOMEPAGE = "http://git.yoctoproject.org/cgit/cgit.cgi/pseudo"
> -LIC_FILES_CHKSUM = "file://COPYING;md5=243b725d71bb5df4a1e5920b344b86ad"
> +LIC_FILES_CHKSUM = "file://COPYING;md5=a1d8023a6f953ac6ea4af765ff62d574"
>   SECTION = "base"
>   LICENSE = "LGPL2.1"
>   DEPENDS = "sqlite3 attr"
> diff --git a/meta/recipes-devtools/pseudo/pseudo_git.bb b/meta/recipes-devtools/pseudo/pseudo_git.bb
> index 51db84c4d4..3350c3fabd 100644
> --- a/meta/recipes-devtools/pseudo/pseudo_git.bb
> +++ b/meta/recipes-devtools/pseudo/pseudo_git.bb
> @@ -8,7 +8,7 @@ SRC_URI = "git://git.yoctoproject.org/pseudo \
>              file://toomanyfiles.patch \
>              "
>   
> -SRCREV = "3fa7c853e0bcd6fe23f7524c2a3c9e3af90901c3"
> +SRCREV = "097ca3e245200c4a4333964af59a106c42ff3bca"
>   S = "${WORKDIR}/git"
>   PV = "1.9.0+git${SRCPV}"
>   




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-01 21:37 ` [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK] Jason Wessel
@ 2019-08-01 23:57   ` Seebs
  2019-08-02 16:27     ` Jason Wessel
  0 siblings, 1 reply; 15+ messages in thread
From: Seebs @ 2019-08-01 23:57 UTC (permalink / raw)
  To: Jason Wessel; +Cc: openembedded-core

On Thu, 1 Aug 2019 16:37:26 -0500
Jason Wessel <jason.wessel@windriver.com> wrote:

> It seems to have caused really odd problems with the oe link
> management that were not there previously, such as:
> 
> 
> WARNING: pinentry-1.1.0-r0 do_package_qa: QA Issue:
> pinentry: /usr/bin/pinentry is owned by uid 5002, which is the same
> as the user running bitbake. This may be due to host contamination
> [host-user-contaminated]
> 
> I'll continue to look into the problem.

There's a possibility that the right flag is something like
	(flags&O_NOFOLLOW)&&!(flags&O_PATH)

or something like that. There's a handful of references to this in
wrapfuncs.in in ports/unix and ports/linux.

-s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-01 23:57   ` Seebs
@ 2019-08-02 16:27     ` Jason Wessel
  2019-08-02 17:07       ` Seebs
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wessel @ 2019-08-02 16:27 UTC (permalink / raw)
  To: Seebs; +Cc: openembedded-core

It took a while to narrow this down to a concise test case, and I am not exactly sure what is going on in pseudo.  The C app is created based on mimicking exactly the python code that causes the failure, so that bitbake can be entirely removed from the picture.

If you use the master branch of pseudo with the C app below, it will something like the following but with a different owner uid if yours is not 5002.

===
Test 1 good
Test 2 good
Test 3 good
Test 4 good
Test 5 failed... tlink is owned by 5002 and not 0
===


The sequence of openat() followed by an fstat() on the opened file handle, will erase the pseudo uid entry for the symlink, as shown by the following lstat() in test 5. The culprit appears to be the fstat(), but it could be something much more complex than that...  The next step is to figure out why the recent change to openat() to address test case 1, caused this new problem.


==== test case app.c ====

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <unistd.h>
#include <fcntl.h>

int main()
{
     /* Tested with: gcc -Wall -o app app.c ; pseudo ./app */
     system("rm -rf tdir tlink");
     system("mkdir tdir");
     system("ln -s tdir tlink");
     DIR *dir = opendir(".");
     int dfd = dirfd(dir);

     int target_dfd = openat (dfd, "tlink", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC | O_NOFOLLOW);
     if (target_dfd == -1) {
         printf("Test 1 good\n");
     } else {
         printf("Test 1 failed\n");
		close(target_dfd);
     }
	target_dfd = openat (dfd, "tlink", O_RDONLY | O_NONBLOCK | O_DIRECTORY | O_CLOEXEC);
     if (target_dfd == -1) {
         printf("Test 2 failed\n");
     } else {
         printf("Test 2 good\n");
		close(target_dfd);
     }
	/* Test 3 make sure the owner of the link is root  */
	struct stat sbuf;
	if (!lstat("tlink", &sbuf) && sbuf.st_uid == 0) {
		printf("Test 3 good\n");
	} else {
		printf("Test 3 failed\n");
	}
	/* Test 4 tests open with the "rb" flag, owner should not change */
	int ofd = openat(dfd,"./tlink", O_RDONLY|O_CLOEXEC);
	if (ofd >= 0) {
		if (fstat(ofd, &sbuf) != 0)
			printf("ERROR in fstat test 4\n");
		else if (sbuf.st_uid == 0)
			printf("Test 4 good\n");
		close(ofd);
	} else {
		printf("Test 4 failed with openat()\n");
	}
	/* In pseudo, after the fstat above, it seems the db is corrupted */
	if (!lstat("tlink", &sbuf) && sbuf.st_uid == 0)
         printf("Test 5 good\n");
	else
         printf("Test 5 failed... tlink is owned by %i and not 0\n", sbuf.st_uid);

     return 0;
}



On 8/1/19 6:57 PM, Seebs wrote:
> On Thu, 1 Aug 2019 16:37:26 -0500
> Jason Wessel <jason.wessel@windriver.com> wrote:
> 
>> It seems to have caused really odd problems with the oe link
>> management that were not there previously, such as:
>>
>>
>> WARNING: pinentry-1.1.0-r0 do_package_qa: QA Issue:
>> pinentry: /usr/bin/pinentry is owned by uid 5002, which is the same
>> as the user running bitbake. This may be due to host contamination
>> [host-user-contaminated]
>>
>> I'll continue to look into the problem.
> 
> There's a possibility that the right flag is something like
> 	(flags&O_NOFOLLOW)&&!(flags&O_PATH)
> 
> or something like that. There's a handful of references to this in
> wrapfuncs.in in ports/unix and ports/linux.
> 
> -s
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-02 16:27     ` Jason Wessel
@ 2019-08-02 17:07       ` Seebs
  2019-08-02 17:42         ` Seebs
  0 siblings, 1 reply; 15+ messages in thread
From: Seebs @ 2019-08-02 17:07 UTC (permalink / raw)
  To: Jason Wessel; +Cc: openembedded-core

On Fri, 2 Aug 2019 11:27:45 -0500
Jason Wessel <jason.wessel@windriver.com> wrote:

> The sequence of openat() followed by an fstat() on the opened file
> handle, will erase the pseudo uid entry for the symlink, as shown by
> the following lstat() in test 5. The culprit appears to be the
> fstat(), but it could be something much more complex than that...
> The next step is to figure out why the recent change to openat() to
> address test case 1, caused this new problem.

I suspect I know that one, although I'm not sure I know the details.

Pseudo will destroy entries of incompatible directory-entry types; for
instance, if it has the same path listed as both a plain file and a
directory. But consider, from openat.c:

#ifdef PSEUDO_NO_REAL_AT_FUNCTIONS
                rc = real___xstat64(_STAT_VER, path, &buf);
#else           
                rc = real___fxstatat64(_STAT_VER, dirfd, path, &buf, 0);
#endif

Note that there's no lstat, and no AT_SYMLINK_NOFOLLOW. Which is to say,
these stats will be following the symlink even though O_NOFOLLOW was
set. I can probably patch this in a bit.

-s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-02 17:07       ` Seebs
@ 2019-08-02 17:42         ` Seebs
  2019-08-03 12:33           ` Khem Raj
  0 siblings, 1 reply; 15+ messages in thread
From: Seebs @ 2019-08-02 17:42 UTC (permalink / raw)
  To: Jason Wessel; +Cc: openembedded-core

On Fri, 2 Aug 2019 12:07:33 -0500
Seebs <seebs@seebs.net> wrote:

> Note that there's no lstat, and no AT_SYMLINK_NOFOLLOW. Which is to
> say, these stats will be following the symlink even though O_NOFOLLOW
> was set. I can probably patch this in a bit.

Followup: Patch applied to master, but also in addition to fixing the
stat calls, I had to use `flags&O_NOFOLLOW` rather than
`flags|O_NOFOLLOW`.

I am sort of amazed at how much DIDN'T break right away with that one.

-s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-02 17:42         ` Seebs
@ 2019-08-03 12:33           ` Khem Raj
  2019-08-03 14:23             ` Seebs
  0 siblings, 1 reply; 15+ messages in thread
From: Khem Raj @ 2019-08-03 12:33 UTC (permalink / raw)
  To: Seebs; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]

On Fri, Aug 2, 2019 at 10:49 AM Seebs <seebs@seebs.net> wrote:

> On Fri, 2 Aug 2019 12:07:33 -0500
> Seebs <seebs@seebs.net> wrote:
>
> > Note that there's no lstat, and no AT_SYMLINK_NOFOLLOW. Which is to
> > say, these stats will be following the symlink even though O_NOFOLLOW
> > was set. I can probably patch this in a bit.
>
> Followup: Patch applied to master, but also in addition to fixing the
> stat calls, I had to use `flags&O_NOFOLLOW` rather than
> `flags|O_NOFOLLOW`.
>
> I am sort of amazed at how much DIDN'T break right away with that one.
>

Will this fix the file ownership issue that we see with Glibc-locale
packages from time to time?

>
> -s
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>

[-- Attachment #2: Type: text/html, Size: 1620 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-03 12:33           ` Khem Raj
@ 2019-08-03 14:23             ` Seebs
  2019-08-03 14:57               ` Khem Raj
  2019-08-05 22:42               ` Bystricky, Juro
  0 siblings, 2 replies; 15+ messages in thread
From: Seebs @ 2019-08-03 14:23 UTC (permalink / raw)
  To: Khem Raj; +Cc: openembedded-core

On Sat, 3 Aug 2019 05:33:46 -0700
Khem Raj <raj.khem@gmail.com> wrote:

> Will this fix the file ownership issue that we see with Glibc-locale
> packages from time to time?

I have no idea. Since I haven't got a reliable reproducer for it, I
can't test it in a sane way.

-s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-03 14:23             ` Seebs
@ 2019-08-03 14:57               ` Khem Raj
  2019-08-04 16:13                 ` Mark Hatle
  2019-08-05 14:23                 ` Jason Wessel
  2019-08-05 22:42               ` Bystricky, Juro
  1 sibling, 2 replies; 15+ messages in thread
From: Khem Raj @ 2019-08-03 14:57 UTC (permalink / raw)
  To: Seebs; +Cc: Patches and discussions about the oe-core layer

I see the locale issue atleast 5-7 times a week on world builds so I
will be able to see if that frequency stays same after this fix.

On Sat, Aug 3, 2019 at 7:23 AM Seebs <seebs@seebs.net> wrote:
>
> On Sat, 3 Aug 2019 05:33:46 -0700
> Khem Raj <raj.khem@gmail.com> wrote:
>
> > Will this fix the file ownership issue that we see with Glibc-locale
> > packages from time to time?
>
> I have no idea. Since I haven't got a reliable reproducer for it, I
> can't test it in a sane way.
>
> -s


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-03 14:57               ` Khem Raj
@ 2019-08-04 16:13                 ` Mark Hatle
  2019-08-05 14:23                 ` Jason Wessel
  1 sibling, 0 replies; 15+ messages in thread
From: Mark Hatle @ 2019-08-04 16:13 UTC (permalink / raw)
  To: Khem Raj, Seebs; +Cc: Patches and discussions about the oe-core layer

On 8/3/19 9:57 AM, Khem Raj wrote:
> I see the locale issue atleast 5-7 times a week on world builds so I
> will be able to see if that frequency stays same after this fix.
Internally we turned it into an error due to the frequency... out of 3-4k+
builds a week I think we're seeing it may 20 times?

We've not seen any correlation to specific distro or configuration either...

--Mark

> On Sat, Aug 3, 2019 at 7:23 AM Seebs <seebs@seebs.net> wrote:
>>
>> On Sat, 3 Aug 2019 05:33:46 -0700
>> Khem Raj <raj.khem@gmail.com> wrote:
>>
>>> Will this fix the file ownership issue that we see with Glibc-locale
>>> packages from time to time?
>>
>> I have no idea. Since I haven't got a reliable reproducer for it, I
>> can't test it in a sane way.
>>
>> -s



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-03 14:57               ` Khem Raj
  2019-08-04 16:13                 ` Mark Hatle
@ 2019-08-05 14:23                 ` Jason Wessel
  1 sibling, 0 replies; 15+ messages in thread
From: Jason Wessel @ 2019-08-05 14:23 UTC (permalink / raw)
  To: Khem Raj, Seebs; +Cc: Patches and discussions about the oe-core layer

If we understood more about the nature of the race condition a test case could probably be constructed. For now, it is worth a try to see if it is any better.  I am certain the timing will change ever so slightly, so we could hit the glibc-locale issue more or less...

All of the regression tests completed over the weekend and the new code which found the problem in the first place is also working well. I'll send a new patch out to move pseudo forward.

Cheers,
Jason.


On 8/3/19 9:57 AM, Khem Raj wrote:
> I see the locale issue atleast 5-7 times a week on world builds so I
> will be able to see if that frequency stays same after this fix.
>
> On Sat, Aug 3, 2019 at 7:23 AM Seebs <seebs@seebs.net> wrote:
>> On Sat, 3 Aug 2019 05:33:46 -0700
>> Khem Raj <raj.khem@gmail.com> wrote:
>>
>>> Will this fix the file ownership issue that we see with Glibc-locale
>>> packages from time to time?
>> I have no idea. Since I haven't got a reliable reproducer for it, I
>> can't test it in a sane way.
>>
>> -s




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-03 14:23             ` Seebs
  2019-08-03 14:57               ` Khem Raj
@ 2019-08-05 22:42               ` Bystricky, Juro
  2019-08-06  6:51                 ` Martin Jansa
  1 sibling, 1 reply; 15+ messages in thread
From: Bystricky, Juro @ 2019-08-05 22:42 UTC (permalink / raw)
  To: Seebs, Khem Raj; +Cc: openembedded-core

I can reproduce the problem fairly easily  (and, sadly even with the latest commits as 060058bb29f70b244e685b3c704eb0641b736f73 ). 
In my case, it seems easy to reproduce if I have 40+ threads running.
The reproducer script (below) fails typically within the first 10 iterations.


#!/bin/bash

fname='glibc-locale_master_august8'
max=1000
for (( i=1; i <= $max; i++ ))
do
    echo "$i/$max  ${fname}_$i.log"
    bitbake glibc-locale -c cleanall 2>&1 > /dev/null
    bitbake glibc-locale 2>&1 > ${fname}_$i.log
     if grep -q "host-user-contaminated" ${fname}_$i.log; then
        echo "error !"
      exit 2
    #else
      #rm ${fname}_$i.log
    fi

done

________________________________________
From: openembedded-core-bounces@lists.openembedded.org [openembedded-core-bounces@lists.openembedded.org] on behalf of Seebs [seebs@seebs.net]
Sent: Saturday, August 03, 2019 7:23 AM
To: Khem Raj
Cc: openembedded-core@lists.openembedded.org
Subject: Re: [OE-core] [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]

On Sat, 3 Aug 2019 05:33:46 -0700
Khem Raj <raj.khem@gmail.com> wrote:

> Will this fix the file ownership issue that we see with Glibc-locale
> packages from time to time?

I have no idea. Since I haven't got a reliable reproducer for it, I
can't test it in a sane way.

-s
--
_______________________________________________
Openembedded-core mailing list
Openembedded-core@lists.openembedded.org
http://lists.openembedded.org/mailman/listinfo/openembedded-core


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-05 22:42               ` Bystricky, Juro
@ 2019-08-06  6:51                 ` Martin Jansa
  2019-08-14 16:02                   ` Randy MacLeod
  0 siblings, 1 reply; 15+ messages in thread
From: Martin Jansa @ 2019-08-06  6:51 UTC (permalink / raw)
  To: Bystricky, Juro; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 2302 bytes --]

This is the same reproducer I am using in:
https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434
but with this SRCREV I haven't reproduced it yet in first 500 iterations,
so it's definitely improving for me (used to reproduce it at least once in
first 500 iterations)

Now I'm testing the reproducer with "qmake -install qinstall".

Regards,

On Tue, Aug 6, 2019 at 12:43 AM Bystricky, Juro <juro.bystricky@intel.com>
wrote:

> I can reproduce the problem fairly easily  (and, sadly even with the
> latest commits as 060058bb29f70b244e685b3c704eb0641b736f73 ).
> In my case, it seems easy to reproduce if I have 40+ threads running.
> The reproducer script (below) fails typically within the first 10
> iterations.
>
>
> #!/bin/bash
>
> fname='glibc-locale_master_august8'
> max=1000
> for (( i=1; i <= $max; i++ ))
> do
>     echo "$i/$max  ${fname}_$i.log"
>     bitbake glibc-locale -c cleanall 2>&1 > /dev/null
>     bitbake glibc-locale 2>&1 > ${fname}_$i.log
>      if grep -q "host-user-contaminated" ${fname}_$i.log; then
>         echo "error !"
>       exit 2
>     #else
>       #rm ${fname}_$i.log
>     fi
>
> done
>
> ________________________________________
> From: openembedded-core-bounces@lists.openembedded.org [
> openembedded-core-bounces@lists.openembedded.org] on behalf of Seebs [
> seebs@seebs.net]
> Sent: Saturday, August 03, 2019 7:23 AM
> To: Khem Raj
> Cc: openembedded-core@lists.openembedded.org
> Subject: Re: [OE-core] [PATCH v2] pseudo: Upgrade to latest to fix
> openat() with a directory symlink [NAK]
>
> On Sat, 3 Aug 2019 05:33:46 -0700
> Khem Raj <raj.khem@gmail.com> wrote:
>
> > Will this fix the file ownership issue that we see with Glibc-locale
> > packages from time to time?
>
> I have no idea. Since I haven't got a reliable reproducer for it, I
> can't test it in a sane way.
>
> -s
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
>

[-- Attachment #2: Type: text/html, Size: 3673 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-06  6:51                 ` Martin Jansa
@ 2019-08-14 16:02                   ` Randy MacLeod
  2019-08-14 17:09                     ` Martin Jansa
  0 siblings, 1 reply; 15+ messages in thread
From: Randy MacLeod @ 2019-08-14 16:02 UTC (permalink / raw)
  To: Martin Jansa, Bystricky, Juro; +Cc: openembedded-core

On 8/6/19 2:51 AM, Martin Jansa wrote:
> This is the same reproducer I am using in:
> https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434
> but with this SRCREV I haven't reproduced it yet in first 500 
> iterations, so it's definitely improving for me (used to reproduce it at 
> least once in first 500 iterations)
> 
> Now I'm testing the reproducer with "qmake -install qinstall".

Any update Martin?



Using a variation of Juro's script and adding a little stress-ng load,
it _seems_ that I can make the problem happen more quickly than without
system stress but it's a shared system so _seems_ is underlined.

Using stress-ng was supposed to be a quick check to see if I could
get the reproducer down to minutes rather than around an hour.

Results are promising so I'll continue to use this approach as
I add debugging to pseudo and add an inline, immediate check in
the context of:
 
http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-core/glibc/glibc-locale.inc?h=master#n72
to see if the UID/GID are equal to my UID/GID.

Test runs summaries are below.

../Randy



cat src/distro/yocto/b/uid-diff/glibc-locale-stress
#!/bin/bash

fname='glibc-locale_master_august13'
max=100
for (( i=1; i <= $max; i++ ))
do
     echo "$i/$max  ${fname}_$i.log"
     bitbake glibc-locale -c cleanall 2>&1 > /dev/null
     # add some stress
     stress-ng -t 1000 --switch 8 --switch-freq 50000 &
     bitbake glibc-locale 2>&1 > ${fname}_$i.log
     # Destress
     killall -9 stress-ng
     if grep -q "host-user-contaminated" ${fname}_$i.log; then
         echo "error !"
       exit 2
     #else
       #rm ${fname}_$i.log
     fi
done


On a (shared) system where lscpu shows 128 cores
and no stress:

Trial   Iteration Error
1       44
2       19


stress-ng -t 1000 --switch 8 --switch-freq 50000

50000 was just the frequency that generated a high enough
but not too high load. On this systems, each process used ~30% of a cpu.

Trial   Iteration Error
1       3
2       18


stress-ng -t 1000 --switch 16 --switch-freq 50000

Trial   Iteration Error
1       3
2       1
3       11

stress-ng -t 1000 --switch 32 --switch-freq 50000

Trial   Iteration Error
1       2
2       9
3       8


stress-ng -t 1000 --switch 64 --switch-freq 50000

Trial   Iteration Error
1       4
2       13
3       >6


stress-ng -t 1000 --mq 64
  128 processes using 98% cpu each

Trial   Iteration Error
1       14
2       NaN

Trial 2 was precluded by other users of the shared system complaining!
The idea was to cause more rapid context switches. Later, I might try
this again with say 16 workers. If anyone has a better idea, please
reply.

EOM

> 
> Regards,
> 
> On Tue, Aug 6, 2019 at 12:43 AM Bystricky, Juro 
> <juro.bystricky@intel.com <mailto:juro.bystricky@intel.com>> wrote:
> 
>     I can reproduce the problem fairly easily  (and, sadly even with the
>     latest commits as 060058bb29f70b244e685b3c704eb0641b736f73 ).
>     In my case, it seems easy to reproduce if I have 40+ threads running.
>     The reproducer script (below) fails typically within the first 10
>     iterations.
> 
> 
>     #!/bin/bash
> 
>     fname='glibc-locale_master_august8'
>     max=1000
>     for (( i=1; i <= $max; i++ ))
>     do
>          echo "$i/$max  ${fname}_$i.log"
>          bitbake glibc-locale -c cleanall 2>&1 > /dev/null
>          bitbake glibc-locale 2>&1 > ${fname}_$i.log
>           if grep -q "host-user-contaminated" ${fname}_$i.log; then
>              echo "error !"
>            exit 2
>          #else
>            #rm ${fname}_$i.log
>          fi
> 
>     done
> 
>     ________________________________________
>     From: openembedded-core-bounces@lists.openembedded.org
>     <mailto:openembedded-core-bounces@lists.openembedded.org>
>     [openembedded-core-bounces@lists.openembedded.org
>     <mailto:openembedded-core-bounces@lists.openembedded.org>] on behalf
>     of Seebs [seebs@seebs.net <mailto:seebs@seebs.net>]
>     Sent: Saturday, August 03, 2019 7:23 AM
>     To: Khem Raj
>     Cc: openembedded-core@lists.openembedded.org
>     <mailto:openembedded-core@lists.openembedded.org>
>     Subject: Re: [OE-core] [PATCH v2] pseudo: Upgrade to latest to fix
>     openat() with a directory symlink [NAK]
> 
>     On Sat, 3 Aug 2019 05:33:46 -0700
>     Khem Raj <raj.khem@gmail.com <mailto:raj.khem@gmail.com>> wrote:
> 
>      > Will this fix the file ownership issue that we see with Glibc-locale
>      > packages from time to time?
> 
>     I have no idea. Since I haven't got a reliable reproducer for it, I
>     can't test it in a sane way.
> 
>     -s
>     --
>     _______________________________________________
>     Openembedded-core mailing list
>     Openembedded-core@lists.openembedded.org
>     <mailto:Openembedded-core@lists.openembedded.org>
>     http://lists.openembedded.org/mailman/listinfo/openembedded-core
>     -- 
>     _______________________________________________
>     Openembedded-core mailing list
>     Openembedded-core@lists.openembedded.org
>     <mailto:Openembedded-core@lists.openembedded.org>
>     http://lists.openembedded.org/mailman/listinfo/openembedded-core
> 
> 


-- 
# Randy MacLeod
# Wind River Linux


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK]
  2019-08-14 16:02                   ` Randy MacLeod
@ 2019-08-14 17:09                     ` Martin Jansa
  0 siblings, 0 replies; 15+ messages in thread
From: Martin Jansa @ 2019-08-14 17:09 UTC (permalink / raw)
  To: Randy MacLeod; +Cc: openembedded-core

[-- Attachment #1: Type: text/plain, Size: 5980 bytes --]

"qmake -install qinstall" reproducer from the bugzilla ticket still
reproduces the issues every time even with latest pseudo, but that might be
different root cause than glibc-locale issue.

On Wed, Aug 14, 2019 at 6:02 PM Randy MacLeod <randy.macleod@windriver.com>
wrote:

> On 8/6/19 2:51 AM, Martin Jansa wrote:
> > This is the same reproducer I am using in:
> > https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434
> > but with this SRCREV I haven't reproduced it yet in first 500
> > iterations, so it's definitely improving for me (used to reproduce it at
> > least once in first 500 iterations)
> >
> > Now I'm testing the reproducer with "qmake -install qinstall".
>
> Any update Martin?
>
>
>
> Using a variation of Juro's script and adding a little stress-ng load,
> it _seems_ that I can make the problem happen more quickly than without
> system stress but it's a shared system so _seems_ is underlined.
>
> Using stress-ng was supposed to be a quick check to see if I could
> get the reproducer down to minutes rather than around an hour.
>
> Results are promising so I'll continue to use this approach as
> I add debugging to pseudo and add an inline, immediate check in
> the context of:
>
>
> http://cgit.openembedded.org/openembedded-core/tree/meta/recipes-core/glibc/glibc-locale.inc?h=master#n72
> to see if the UID/GID are equal to my UID/GID.
>
> Test runs summaries are below.
>
> ../Randy
>
>
>
> cat src/distro/yocto/b/uid-diff/glibc-locale-stress
> #!/bin/bash
>
> fname='glibc-locale_master_august13'
> max=100
> for (( i=1; i <= $max; i++ ))
> do
>      echo "$i/$max  ${fname}_$i.log"
>      bitbake glibc-locale -c cleanall 2>&1 > /dev/null
>      # add some stress
>      stress-ng -t 1000 --switch 8 --switch-freq 50000 &
>      bitbake glibc-locale 2>&1 > ${fname}_$i.log
>      # Destress
>      killall -9 stress-ng
>      if grep -q "host-user-contaminated" ${fname}_$i.log; then
>          echo "error !"
>        exit 2
>      #else
>        #rm ${fname}_$i.log
>      fi
> done
>
>
> On a (shared) system where lscpu shows 128 cores
> and no stress:
>
> Trial   Iteration Error
> 1       44
> 2       19
>
>
> stress-ng -t 1000 --switch 8 --switch-freq 50000
>
> 50000 was just the frequency that generated a high enough
> but not too high load. On this systems, each process used ~30% of a cpu.
>
> Trial   Iteration Error
> 1       3
> 2       18
>
>
> stress-ng -t 1000 --switch 16 --switch-freq 50000
>
> Trial   Iteration Error
> 1       3
> 2       1
> 3       11
>
> stress-ng -t 1000 --switch 32 --switch-freq 50000
>
> Trial   Iteration Error
> 1       2
> 2       9
> 3       8
>
>
> stress-ng -t 1000 --switch 64 --switch-freq 50000
>
> Trial   Iteration Error
> 1       4
> 2       13
> 3       >6
>
>
> stress-ng -t 1000 --mq 64
>   128 processes using 98% cpu each
>
> Trial   Iteration Error
> 1       14
> 2       NaN
>
> Trial 2 was precluded by other users of the shared system complaining!
> The idea was to cause more rapid context switches. Later, I might try
> this again with say 16 workers. If anyone has a better idea, please
> reply.
>
> EOM
>
> >
> > Regards,
> >
> > On Tue, Aug 6, 2019 at 12:43 AM Bystricky, Juro
> > <juro.bystricky@intel.com <mailto:juro.bystricky@intel.com>> wrote:
> >
> >     I can reproduce the problem fairly easily  (and, sadly even with the
> >     latest commits as 060058bb29f70b244e685b3c704eb0641b736f73 ).
> >     In my case, it seems easy to reproduce if I have 40+ threads running.
> >     The reproducer script (below) fails typically within the first 10
> >     iterations.
> >
> >
> >     #!/bin/bash
> >
> >     fname='glibc-locale_master_august8'
> >     max=1000
> >     for (( i=1; i <= $max; i++ ))
> >     do
> >          echo "$i/$max  ${fname}_$i.log"
> >          bitbake glibc-locale -c cleanall 2>&1 > /dev/null
> >          bitbake glibc-locale 2>&1 > ${fname}_$i.log
> >           if grep -q "host-user-contaminated" ${fname}_$i.log; then
> >              echo "error !"
> >            exit 2
> >          #else
> >            #rm ${fname}_$i.log
> >          fi
> >
> >     done
> >
> >     ________________________________________
> >     From: openembedded-core-bounces@lists.openembedded.org
> >     <mailto:openembedded-core-bounces@lists.openembedded.org>
> >     [openembedded-core-bounces@lists.openembedded.org
> >     <mailto:openembedded-core-bounces@lists.openembedded.org>] on behalf
> >     of Seebs [seebs@seebs.net <mailto:seebs@seebs.net>]
> >     Sent: Saturday, August 03, 2019 7:23 AM
> >     To: Khem Raj
> >     Cc: openembedded-core@lists.openembedded.org
> >     <mailto:openembedded-core@lists.openembedded.org>
> >     Subject: Re: [OE-core] [PATCH v2] pseudo: Upgrade to latest to fix
> >     openat() with a directory symlink [NAK]
> >
> >     On Sat, 3 Aug 2019 05:33:46 -0700
> >     Khem Raj <raj.khem@gmail.com <mailto:raj.khem@gmail.com>> wrote:
> >
> >      > Will this fix the file ownership issue that we see with
> Glibc-locale
> >      > packages from time to time?
> >
> >     I have no idea. Since I haven't got a reliable reproducer for it, I
> >     can't test it in a sane way.
> >
> >     -s
> >     --
> >     _______________________________________________
> >     Openembedded-core mailing list
> >     Openembedded-core@lists.openembedded.org
> >     <mailto:Openembedded-core@lists.openembedded.org>
> >     http://lists.openembedded.org/mailman/listinfo/openembedded-core
> >     --
> >     _______________________________________________
> >     Openembedded-core mailing list
> >     Openembedded-core@lists.openembedded.org
> >     <mailto:Openembedded-core@lists.openembedded.org>
> >     http://lists.openembedded.org/mailman/listinfo/openembedded-core
> >
> >
>
>
> --
> # Randy MacLeod
> # Wind River Linux
>

[-- Attachment #2: Type: text/html, Size: 9120 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-14 17:09 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-01 20:03 [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink Jason Wessel
2019-08-01 21:37 ` [PATCH v2] pseudo: Upgrade to latest to fix openat() with a directory symlink [NAK] Jason Wessel
2019-08-01 23:57   ` Seebs
2019-08-02 16:27     ` Jason Wessel
2019-08-02 17:07       ` Seebs
2019-08-02 17:42         ` Seebs
2019-08-03 12:33           ` Khem Raj
2019-08-03 14:23             ` Seebs
2019-08-03 14:57               ` Khem Raj
2019-08-04 16:13                 ` Mark Hatle
2019-08-05 14:23                 ` Jason Wessel
2019-08-05 22:42               ` Bystricky, Juro
2019-08-06  6:51                 ` Martin Jansa
2019-08-14 16:02                   ` Randy MacLeod
2019-08-14 17:09                     ` Martin Jansa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.