All of lore.kernel.org
 help / color / mirror / Atom feed
* [pseudo] Pseudo 1.8+ xattr sqlite corruption
@ 2018-08-20 18:45 Jack.Fewx
  2018-08-22  8:41 ` Alexander Kanavin
  0 siblings, 1 reply; 22+ messages in thread
From: Jack.Fewx @ 2018-08-20 18:45 UTC (permalink / raw)
  To: yocto

We are encountering a build problem after migrating to Poky 2.3 and Pseudo 1.8.1, and need help to resolve this.
It is hampering our development efforts, forcing us to rebuild images frequently.

Background:
Our build applies SELinux file contexts, during build time since our rootfs is read-only
In Poky 2.0, using Pseudo 1.6.2 this works perfectly 100% of the time

Problem:
Since the upgrade to 2.3 there is a 33%+ chance that the SELinux context labels will be corrupt at the end of the build.
The chance is random.  Cleaning and Rebuilding a bad image target results in success/failure of equal likelihood. We can go days without an error, or like this weekend all 12 builds failed!

Failure mode:
We have learned to identify the failure and mark builds bad based on the contents of the Pseudo SQLite database generated by the image build.

A good build will have unique Inode to xattr references in the "xattrs" table.  We prove pass/fail by doing a query of All entries and unique entries and verify the counts match.
Example of a good result, sorted by "ino":
Id		dev		ino		name			value
"1"		"64773"	"251402120"	"security.selinux"	system_u:object_r:root_t
"10012"	"64773"	"251402121"	"security.selinux"	system_u:object_r:var_t
"7293"		"64773"	"251402124"	"security.selinux"	system_u:object_r:lib_t
"19"		"64773"	"251402133"	"security.selinux"	system_u:object_r:var_run_t

On a bad build, there will be numerous duplicates in this table.  Why this causes the failure I do not know, but this is just what we found is indicative to failure without flashing the image on something.
Example of a bad result, again sorted by "ino":

Id		dev	ino		name			value
"10067"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"31918"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"59307"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"61317"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"61737"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"61793"	"45"	"2293256211"	"security.selinux"	system_u:object_r:usr_t
"11849"	"45"	"2293250079"	"security.selinux"	system_u:object_r:var_spool_t
"66928"	"45"	"2293250079"	"security.selinux"	system_u:object_r:var_spool_t
"66948"	"45"	"2293250079"	"security.selinux"	system_u:object_r:var_spool_t

Any help would be greatly appreciated.

Jack Fewx
Software Senior Principal Engineer
Dell EMC | Server and Infrastructure Systems
jack_fewx@dell.com



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-20 18:45 [pseudo] Pseudo 1.8+ xattr sqlite corruption Jack.Fewx
@ 2018-08-22  8:41 ` Alexander Kanavin
  2018-08-22 13:58   ` Jack.Fewx
  0 siblings, 1 reply; 22+ messages in thread
From: Alexander Kanavin @ 2018-08-22  8:41 UTC (permalink / raw)
  To: Jack.Fewx; +Cc: Yocto discussion list

2018-08-20 20:45 GMT+02:00  <Jack.Fewx@dell.com>:
> We are encountering a build problem after migrating to Poky 2.3 and Pseudo 1.8.1, and need help to resolve this.
> It is hampering our development efforts, forcing us to rebuild images frequently.
>
> Background:
> Our build applies SELinux file contexts, during build time since our rootfs is read-only
> In Poky 2.0, using Pseudo 1.6.2 this works perfectly 100% of the time

Pseudo is not an integral part of poky, and comes via a recipe build
like anything else. You can play with that recipe, and establish which
commit in pseudo's upstream git repo broke this. Also 2.3 is better
than 2.0 but still kind of old. Do try this on the latest yocto
release, or even on the master branch.

Alex


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22  8:41 ` Alexander Kanavin
@ 2018-08-22 13:58   ` Jack.Fewx
  2018-08-22 14:41     ` Joshua Watt
  2018-08-22 14:44     ` Alexander Kanavin
  0 siblings, 2 replies; 22+ messages in thread
From: Jack.Fewx @ 2018-08-22 13:58 UTC (permalink / raw)
  To: alex.kanavin; +Cc: yocto

Dell - Internal Use - Confidential  

> 2018-08-20 20:45 GMT+02:00  <Jack.Fewx@dell.com>:
>> We are encountering a build problem after migrating to Poky 2.3 and Pseudo 1.8.1, and need help to resolve this.
>> It is hampering our development efforts, forcing us to rebuild images frequently.
>>
>> Background:
>> Our build applies SELinux file contexts, during build time since our 
>> rootfs is read-only In Poky 2.0, using Pseudo 1.6.2 this works 
>> perfectly 100% of the time

> Pseudo is not an integral part of poky, and comes via a recipe build like anything else. You can play with that recipe, and establish which commit in pseudo's upstream git repo broke this. Also 2.3 is better than 2.0 but still kind of old. Do try this on the latest yocto release, or even on the master branch.

> Alex

I should add that the same problem exists in Poky 2.5, and top of Pseudo git repo.  The problem was introduced, best I can tell, was when the entire Pseudo database structure was rewritten.  As a result of the major overhaul messing with patches is problematic to impossible.

If Pseudo is not a part of Poky proper, yet is a completely integral part of the build, is there a better place to field this question?

Jack

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 13:58   ` Jack.Fewx
@ 2018-08-22 14:41     ` Joshua Watt
  2018-08-22 14:54       ` Jack.Fewx
  2018-08-22 14:44     ` Alexander Kanavin
  1 sibling, 1 reply; 22+ messages in thread
From: Joshua Watt @ 2018-08-22 14:41 UTC (permalink / raw)
  To: Jack.Fewx, alex.kanavin; +Cc: yocto

On Wed, 2018-08-22 at 13:58 +0000, Jack.Fewx@dell.com wrote:
> Dell - Internal Use - Confidential  
> 
> > 2018-08-20 20:45 GMT+02:00  <Jack.Fewx@dell.com>:
> > > We are encountering a build problem after migrating to Poky 2.3
> > > and Pseudo 1.8.1, and need help to resolve this.
> > > It is hampering our development efforts, forcing us to rebuild
> > > images frequently.
> > > 
> > > Background:
> > > Our build applies SELinux file contexts, during build time since
> > > our 
> > > rootfs is read-only In Poky 2.0, using Pseudo 1.6.2 this works 
> > > perfectly 100% of the time
> > Pseudo is not an integral part of poky, and comes via a recipe
> > build like anything else. You can play with that recipe, and
> > establish which commit in pseudo's upstream git repo broke this.
> > Also 2.3 is better than 2.0 but still kind of old. Do try this on
> > the latest yocto release, or even on the master branch.
> > Alex
> 
> I should add that the same problem exists in Poky 2.5, and top of
> Pseudo git repo.  The problem was introduced, best I can tell, was
> when the entire Pseudo database structure was rewritten.  As a result
> of the major overhaul messing with patches is problematic to
> impossible.
> 
> If Pseudo is not a part of Poky proper, yet is a completely integral
> part of the build, is there a better place to field this question?
> 

Out of curiosity, what is the failure mode here? Are there any
indications of a failed build in the output, or do you have to look at
the sqlite database to tell something is wrong?

> Jack
-- 
Joshua Watt <JPEWhacker@gmail.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 13:58   ` Jack.Fewx
  2018-08-22 14:41     ` Joshua Watt
@ 2018-08-22 14:44     ` Alexander Kanavin
  1 sibling, 0 replies; 22+ messages in thread
From: Alexander Kanavin @ 2018-08-22 14:44 UTC (permalink / raw)
  To: Jack.Fewx, seebs; +Cc: Yocto discussion list

2018-08-22 15:58 GMT+02:00  <Jack.Fewx@dell.com>:
> I should add that the same problem exists in Poky 2.5, and top of Pseudo git repo.  The problem was introduced, best I can tell, was when the entire Pseudo database structure was rewritten.  As a result of the major overhaul messing with patches is problematic to impossible.
>
> If Pseudo is not a part of Poky proper, yet is a completely integral part of the build, is there a better place to field this question?

The pseudo man is Mr.Seebs :) Let's gently ping him about it.

Alex


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 14:41     ` Joshua Watt
@ 2018-08-22 14:54       ` Jack.Fewx
  2018-08-22 15:09         ` Richard Purdie
  2018-08-22 16:41         ` Seebs
  0 siblings, 2 replies; 22+ messages in thread
From: Jack.Fewx @ 2018-08-22 14:54 UTC (permalink / raw)
  To: jpewhacker, alex.kanavin, seebs; +Cc: yocto

Dell - Internal Use - Confidential  

> Dell - Internal Use - Confidential
> 
> > 2018-08-20 20:45 GMT+02:00  <Jack.Fewx@dell.com>:
> > > We are encountering a build problem after migrating to Poky 2.3 
> > > and Pseudo 1.8.1, and need help to resolve this.
> > > It is hampering our development efforts, forcing us to rebuild 
> > > images frequently.
> > > 
> > > Background:
> > > Our build applies SELinux file contexts, during build time since 
> > > our rootfs is read-only In Poky 2.0, using Pseudo 1.6.2 this works 
> > > perfectly 100% of the time
> > Pseudo is not an integral part of poky, and comes via a recipe build 
> > like anything else. You can play with that recipe, and establish 
> > which commit in pseudo's upstream git repo broke this.
> > Also 2.3 is better than 2.0 but still kind of old. Do try this on 
> > the latest yocto release, or even on the master branch.
> > Alex
> 
> I should add that the same problem exists in Poky 2.5, and top of 
> Pseudo git repo.  The problem was introduced, best I can tell, was 
> when the entire Pseudo database structure was rewritten.  As a result 
> of the major overhaul messing with patches is problematic to 
> impossible.
> 
> If Pseudo is not a part of Poky proper, yet is a completely integral 
> part of the build, is there a better place to field this question?
> 

> Out of curiosity, what is the failure mode here? Are there any indications of a failed build in the output, or do you have to look at the sqlite database to tell something is wrong?

> Jack
> --
> Joshua Watt <JPEWhacker@gmail.com>

So failure mode is the target filesystem is devoid of SELinux file contexts, all files are unlabeled_t, which pretty much breaks everything in enforcing mode.  So whatever the corruption cause/effect in the Psuedo database, the end result is when Mksquashfs runs it can't get labels for the files.

There is no obvious differences in the pseudo.log files between good and bad runs, so it's nothing Pseudo is screaming about.

I just found the Pseudo debug option flags, and how to insert them using FAKEROOTENV += "PSEUDO_DEBUG=Dx", so I'm running builds trying to get good and bad ones with the debug logs.

Jack

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 14:54       ` Jack.Fewx
@ 2018-08-22 15:09         ` Richard Purdie
  2018-08-22 15:32           ` Jack.Fewx
  2018-09-18 20:26           ` Jack.Fewx
  2018-08-22 16:41         ` Seebs
  1 sibling, 2 replies; 22+ messages in thread
From: Richard Purdie @ 2018-08-22 15:09 UTC (permalink / raw)
  To: Jack.Fewx, jpewhacker, alex.kanavin, seebs; +Cc: yocto

On Wed, 2018-08-22 at 14:54 +0000, Jack.Fewx@dell.com wrote:
> So failure mode is the target filesystem is devoid of SELinux file
> contexts, all files are unlabeled_t, which pretty much breaks
> everything in enforcing mode.  So whatever the corruption
> cause/effect in the Psuedo database, the end result is when
> Mksquashfs runs it can't get labels for the files.
> 
> There is no obvious differences in the pseudo.log files between good
> and bad runs, so it's nothing Pseudo is screaming about.
> 
> I just found the Pseudo debug option flags, and how to insert them
> using FAKEROOTENV += "PSEUDO_DEBUG=Dx", so I'm running builds trying
> to get good and bad ones with the debug logs.

Its not clear if you already tried this but if not, it'd probably be
worth updating pseudo to the latest version too, see if it was some bug
we already addressed in pseudo. I know we've had a few challenges
supporting xattrs in there...

Cheers,

Richard


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 15:09         ` Richard Purdie
@ 2018-08-22 15:32           ` Jack.Fewx
  2018-09-18 20:26           ` Jack.Fewx
  1 sibling, 0 replies; 22+ messages in thread
From: Jack.Fewx @ 2018-08-22 15:32 UTC (permalink / raw)
  To: richard.purdie, jpewhacker, alex.kanavin, seebs; +Cc: yocto

> On Wed, 2018-08-22 at 14:54 +0000, Jack.Fewx@dell.com wrote:
> > So failure mode is the target filesystem is devoid of SELinux file 
> > contexts, all files are unlabeled_t, which pretty much breaks 
> > everything in enforcing mode.  So whatever the corruption cause/effect 
> > in the Psuedo database, the end result is when Mksquashfs runs it 
> > can't get labels for the files.
> > 
> > There is no obvious differences in the pseudo.log files between good 
> > and bad runs, so it's nothing Pseudo is screaming about.
> > 
> > I just found the Pseudo debug option flags, and how to insert them 
> > using FAKEROOTENV += "PSEUDO_DEBUG=Dx", so I'm running builds trying 
> > to get good and bad ones with the debug logs.
>
> Its not clear if you already tried this but if not, it'd probably be worth updating pseudo to the latest version too, see if it was some bug we already addressed in pseudo. I know we've had a few challenges supporting xattrs in there...
>
> Cheers,
>
> Richard

We have a "bleeding edge" test environment for staging our next upgrades, and I see Poky 2.5.1 just dropped, so I'll set that up and give it a whirl. And pull down the top of the Pseudo git tree as well. The only issue is whatever patch fixes this, I will need to backported into our 2.3 environment. The next release is stabilized on 2.3 for shipment soon, so I can't yank the rug out from under that one. If we just have to limp along it'll be annoying, but not the end of the world as we can get some good builds out.

Our 2.3 environment is behaving itself again for the moment, so it will take a bit to get debug logs. Sounds like I have some homework to do, I'll keep everyone posted.

And thanks to all for your quick responses.

Jack Fewx	

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 14:54       ` Jack.Fewx
  2018-08-22 15:09         ` Richard Purdie
@ 2018-08-22 16:41         ` Seebs
  1 sibling, 0 replies; 22+ messages in thread
From: Seebs @ 2018-08-22 16:41 UTC (permalink / raw)
  To: Jack.Fewx; +Cc: yocto

On Wed, 22 Aug 2018 14:54:02 +0000
<Jack.Fewx@dell.com> wrote:

> So failure mode is the target filesystem is devoid of SELinux file
> contexts, all files are unlabeled_t, which pretty much breaks
> everything in enforcing mode.  So whatever the corruption
> cause/effect in the Psuedo database, the end result is when
> Mksquashfs runs it can't get labels for the files.

Ugh. Sorry, this is a known issue, I think we have an open bug for it,
and so far as I could tell the last time I looked at it, it was
theoretically-impossible to fix it sanely.

See:

https://bugzilla.yoctoproject.org/show_bug.cgi?id=6580

The basic problem is: SELinux is extended attributes, and if we are
allowing *any* use of extended attributes, there is no way for pseudo
to distinguish between "host environment is trying to set a host
environment extended attribute" and "build system is trying to set a
target environment extended attribute".

And we can partially address this by turning off xattr support, so all
extended attribute use gets ENOSYS or whatever, but then I think the
host system stuff will also fail.

I am open to suggestions on ways this could be addressed sanely, but I
haven't come up with anything good yet.

(FWIW, I'm more present on the oe-core list, which I still scan for
messages with "pseudo" in the subject line.)

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-08-22 15:09         ` Richard Purdie
  2018-08-22 15:32           ` Jack.Fewx
@ 2018-09-18 20:26           ` Jack.Fewx
  2018-09-18 21:09             ` Seebs
  1 sibling, 1 reply; 22+ messages in thread
From: Jack.Fewx @ 2018-09-18 20:26 UTC (permalink / raw)
  To: richard.purdie, jpewhacker, alex.kanavin, seebs; +Cc: yocto

> > On Wed, 2018-08-22 at 14:54 +0000, Jack.Fewx@dell.com wrote:
> > > So failure mode is the target filesystem is devoid of SELinux file 
> > > contexts, all files are unlabeled_t, which pretty much breaks 
> > > everything in enforcing mode.  So whatever the corruption 
> > > cause/effect in the Psuedo database, the end result is when 
> > > Mksquashfs runs it can't get labels for the files.
> > > 
> > > There is no obvious differences in the pseudo.log files between good 
> > > and bad runs, so it's nothing Pseudo is screaming about.
> > > 
> > > I just found the Pseudo debug option flags, and how to insert them 
> > > using FAKEROOTENV += "PSEUDO_DEBUG=Dx", so I'm running builds trying 
> > > to get good and bad ones with the debug logs.
> >
> > Its not clear if you already tried this but if not, it'd probably be worth updating pseudo to the latest version too, see if it was some bug we already addressed in pseudo. I know we've had a few challenges supporting xattrs in there...
> >
> > Cheers,
> >
> > Richard
> 
> We have a "bleeding edge" test environment for staging our next upgrades, and I see Poky 2.5.1 just dropped, so I'll set that up and give it a whirl. And pull down the top of the Pseudo git tree as well. The only issue is whatever patch fixes this, I will need to backported into our 2.3 environment. The next release is stabilized on 2.3 for shipment soon, so I can't yank the rug out from under that one. If we just have to limp along it'll be annoying, but not the end of the world as we can get some good builds out.
> 
> Our 2.3 environment is behaving itself again for the moment, so it will take a bit to get debug logs. Sounds like I have some homework to do, I'll keep everyone posted.
> 
> And thanks to all for your quick responses.
> 
> Jack Fewx

Update! After running a number of test builds and seeing passes and failures, I found the root cause of our issue.  This one took a while.

The inode values in our build system get too large.  Meaning they exceed the MAX value of a SIGNED 64-bit integer.  As long as they are under that limit the build is fine, as soon as the signed values become "negative" it all gets screwed up.

Good build (paths redacted): 

get-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [16/189]
requested xattr named 'security.selinux' for ino 1573104496
set-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [50/223]
trying to set a value for ino 1573104496: name is 'security.selinux' [16/50 bytes], value is 'system_u:object_r:backup_store_t' [33 bytes]. Existing row -1.

get-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [16/189]
requested xattr named 'security.selinux' for ino 1573104496
got 33-byte results: 'system_u:object_r:backup_store_t'
get results: 'system_u:object_r:backup_store_t' (33 bytes)

list-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath '' [-1/172]
got 17 bytes of xattrs to list: security.selinux

Bad build:

get-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [16/189]
requested xattr named 'security.selinux' for ino 2983570948
set-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [50/223]
trying to set a value for ino 2983570948: name is 'security.selinux' [16/50 bytes], value is 'system_u:object_r:backup_store_t' [33 bytes]. Existing row -1.	

get-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath 'security.selinux' [16/189]
requested xattr named 'security.selinux' for ino 2983570948

list-xattr: path '...image/1.0.0-r0.0/rootfs/var/backups', oldpath '' [-1/172]
got 0 bytes of xattrs to list:

The inode count is the later example is too great, so every lookup fails, and the xattr is set multiple times in the database, yet can't be read out during mkfs.

SO... any suggestions how to make the inodes in the database an UNSIGNED value?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-18 20:26           ` Jack.Fewx
@ 2018-09-18 21:09             ` Seebs
  2018-09-18 21:16               ` Joshua Watt
  0 siblings, 1 reply; 22+ messages in thread
From: Seebs @ 2018-09-18 21:09 UTC (permalink / raw)
  To: Jack.Fewx; +Cc: yocto

On Tue, 18 Sep 2018 20:26:59 +0000
<Jack.Fewx@dell.com> wrote:

> SO... any suggestions how to make the inodes in the database an
> UNSIGNED value?

We probably *can't* -- sqlite doesn't support that! They cap out at 8
byte integer values, and are always signed. I don't know of a way to
fix this. We might be able to trick it by coercing them into the signed
range, and reversing the conversion later. And this is outside the
range that's accurately representable in float, too. Whee!

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-18 21:09             ` Seebs
@ 2018-09-18 21:16               ` Joshua Watt
  2018-09-18 21:20                 ` Seebs
  0 siblings, 1 reply; 22+ messages in thread
From: Joshua Watt @ 2018-09-18 21:16 UTC (permalink / raw)
  To: Seebs; +Cc: yocto

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

On Tue, Sep 18, 2018, 16:09 Seebs <seebs@seebs.net> wrote:

> On Tue, 18 Sep 2018 20:26:59 +0000
> <Jack.Fewx@dell.com> wrote:
>
> > SO... any suggestions how to make the inodes in the database an
> > UNSIGNED value?
>
> We probably *can't* -- sqlite doesn't support that! They cap out at 8
> byte integer values, and are always signed. I don't know of a way to
> fix this. We might be able to trick it by coercing them into the signed
> range, and reversing the conversion later. And this is outside the
> range that's accurately representable in float, too. Whee!
>

Are the databases supposed to be shareable between different build
machines? IIRC, the answer is no. Could you store the native inode type as
a sqlite BLOB? Not necessarily a good idea.... Just an idea.


> -s
>

[-- Attachment #2: Type: text/html, Size: 1459 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-18 21:16               ` Joshua Watt
@ 2018-09-18 21:20                 ` Seebs
  2018-09-19 11:33                   ` Burton, Ross
  0 siblings, 1 reply; 22+ messages in thread
From: Seebs @ 2018-09-18 21:20 UTC (permalink / raw)
  To: Joshua Watt; +Cc: yocto

On Tue, 18 Sep 2018 16:16:22 -0500
Joshua Watt <jpewhacker@gmail.com> wrote:

> Are the databases supposed to be shareable between different build
> machines? IIRC, the answer is no. Could you store the native inode
> type as a sqlite BLOB? Not necessarily a good idea.... Just an idea.

I think coercing the values into range is probably safer. It should be
trivial enough...

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-18 21:20                 ` Seebs
@ 2018-09-19 11:33                   ` Burton, Ross
  2018-09-19 14:39                     ` Seebs
  2018-09-20 19:16                     ` Seebs
  0 siblings, 2 replies; 22+ messages in thread
From: Burton, Ross @ 2018-09-19 11:33 UTC (permalink / raw)
  To: Seebs, Jack.Fewx; +Cc: Yocto-mailing-list

On Tue, 18 Sep 2018 at 22:21, Seebs <seebs@seebs.net> wrote:
> > Are the databases supposed to be shareable between different build
> > machines? IIRC, the answer is no. Could you store the native inode
> > type as a sqlite BLOB? Not necessarily a good idea.... Just an idea.
>
> I think coercing the values into range is probably safer. It should be
> trivial enough...

That is an excellent catch and I'm hopeful that this explains the
failures in glibc-locales too that I still see occasionally.

Is anyone actually writing a patch?

Ross


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-19 11:33                   ` Burton, Ross
@ 2018-09-19 14:39                     ` Seebs
  2018-09-19 16:25                       ` Jack.Fewx
  2018-09-20 19:16                     ` Seebs
  1 sibling, 1 reply; 22+ messages in thread
From: Seebs @ 2018-09-19 14:39 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Yocto-mailing-list

On Wed, 19 Sep 2018 12:33:37 +0100
"Burton, Ross" <ross.burton@intel.com> wrote:

> On Tue, 18 Sep 2018 at 22:21, Seebs <seebs@seebs.net> wrote:
> > > Are the databases supposed to be shareable between different build
> > > machines? IIRC, the answer is no. Could you store the native inode
> > > type as a sqlite BLOB? Not necessarily a good idea.... Just an
> > > idea.
> >
> > I think coercing the values into range is probably safer. It should
> > be trivial enough...
> 
> That is an excellent catch and I'm hopeful that this explains the
> failures in glibc-locales too that I still see occasionally.
> 
> Is anyone actually writing a patch?

I can try to get a proposed patch out sometime soon, I don't have an
easy way to check it.

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-19 14:39                     ` Seebs
@ 2018-09-19 16:25                       ` Jack.Fewx
  0 siblings, 0 replies; 22+ messages in thread
From: Jack.Fewx @ 2018-09-19 16:25 UTC (permalink / raw)
  To: seebs, ross.burton; +Cc: yocto

Dell - Internal Use - Confidential  

> On Wed, 19 Sep 2018 12:33:37 +0100
> "Burton, Ross" <ross.burton@intel.com> wrote:
> 
> > On Tue, 18 Sep 2018 at 22:21, Seebs <seebs@seebs.net> wrote:
> > > > Are the databases supposed to be shareable between different build
> > > > machines? IIRC, the answer is no. Could you store the native inode
> > > > type as a sqlite BLOB? Not necessarily a good idea.... Just an
> > > > idea.
> > >
> > > I think coercing the values into range is probably safer. It should
> > > be trivial enough...
> > 
> > That is an excellent catch and I'm hopeful that this explains the
> > failures in glibc-locales too that I still see occasionally.
> > 
> > Is anyone actually writing a patch?
> 
> I can try to get a proposed patch out sometime soon, I don't have an
> easy way to check it.
> 
> -s

You can send the patch to me, it is easy to reproduce here.

I think the "coercing" of the values is a good path, when the inodes are high, they are usually all high, except for the rare case where inode values are just on either side of the Signed limit.  That happened one time and was the final proof of why this was happening.

A little more background, we have two build environments, the developer workstations and the automated build cluster.  The developers never saw this, because they are stand-alone boxes with typical 4 TB hardrives, max inode counts around a couple million. Our automated build system is a bunch of virtual machines which are setup and torn down with each build.  What we noticed though is each newly created builder gets a new inode starting value, which is an increment from previous ones.  This is why completely restarting the build cluster 'fixed' the issue for a while, it reset the inode numbering. So each builder gets assigned a non-overlapping new block of inodes, e.g. 1 - 1M, 1M+1 - 2M, 2M+1 - 3M, etc.  Eventually this climbs up into the range above 1.5B and the problems begin. The build managers are investigating if there is a way in the cluster config to limit the inode upper limit and force the numbers to wrap around sooner.

One thing I am curious about, is that Pseudo 1.6.x never gave us this problem, was the reference inside the database different? Or maybe it's just a case of never hitting the issue.

Thanks,

Jack Fewx
jack.fewx@dell.com


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-19 11:33                   ` Burton, Ross
  2018-09-19 14:39                     ` Seebs
@ 2018-09-20 19:16                     ` Seebs
  2018-09-20 20:41                       ` Jack.Fewx
  1 sibling, 1 reply; 22+ messages in thread
From: Seebs @ 2018-09-20 19:16 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Yocto-mailing-list

On Wed, 19 Sep 2018 12:33:37 +0100
"Burton, Ross" <ross.burton@intel.com> wrote:

> Is anyone actually writing a patch?

I have a tentative fix for this checked into master, I don't know
whether it actually works because I don't have any inodes over 2^63.

It doesn't seem to have *broken* anything, though in casual testing.
It's not carefully tested.

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-20 19:16                     ` Seebs
@ 2018-09-20 20:41                       ` Jack.Fewx
  2018-09-20 20:46                         ` Seebs
  2018-09-20 20:50                         ` Seebs
  0 siblings, 2 replies; 22+ messages in thread
From: Jack.Fewx @ 2018-09-20 20:41 UTC (permalink / raw)
  To: seebs, ross.burton; +Cc: yocto

Dell - Internal Use - Confidential  

> On Wed, 19 Sep 2018 12:33:37 +0100
> "Burton, Ross" <ross.burton@intel.com> wrote:
>
> > Is anyone actually writing a patch?
>
> I have a tentative fix for this checked into master, I don't know
> whether it actually works because I don't have any inodes over 2^63.
>
> It doesn't seem to have *broken* anything, though in casual testing.
> It's not carefully tested.
> 
> -s

My sincerest apologies, my "maths" failed me.  Number was correct, bound were wrong.  Any inodes above 0x7FFF FFFF (2147483647) fail... HOWEVER that's not 64-bits, that's 32-bits! (DUH JACK!)

Okay, so I found the problem, and I have a patch for it.  Throughout the pseudo_db.c code, the handling of the msg->ino objects is not consistent.  Many get passed to the sqlite3_bind_int64 function, but NOT ALL.  9 instances in the code, including all of them in the xattr code use only sqlite3_bind_int which truncates the integers to signed 32-bits.

Fixing those 9 entries resolves the issue. Patch follows:

Index: git/pseudo_db.c
===================================================================
--- git.orig/pseudo_db.c
+++ git/pseudo_db.c
@@ -1512,9 +1512,9 @@ pdb_clear_xattrs(pseudo_msg_t *msg) {
 		}
 	}
 	sqlite3_bind_int(delete, 1, msg->dev);
-	sqlite3_bind_int(delete, 2, msg->ino);
+	sqlite3_bind_int64(delete, 2, msg->ino);
 	sqlite3_bind_int(delete, 3, msg->dev);
-	sqlite3_bind_int(delete, 4, msg->ino);
+	sqlite3_bind_int64(delete, 4, msg->ino);
 	rc = sqlite3_step(delete);
 	if (rc != SQLITE_DONE) {
 		dberr(file_db, "delete of unused xattrs may have failed");
@@ -1549,9 +1549,9 @@ pdb_copy_xattrs(pseudo_msg_t *oldmsg, ps
 		}
 	}
 	sqlite3_bind_int(copy, 1, msg->dev);
-	sqlite3_bind_int(copy, 2, msg->ino);
+	sqlite3_bind_int64(copy, 2, msg->ino);
 	sqlite3_bind_int(copy, 3, oldmsg->dev);
-	sqlite3_bind_int(copy, 4, oldmsg->ino);
+	sqlite3_bind_int64(copy, 4, oldmsg->ino);
 	rc = sqlite3_step(copy);
 	if (rc != SQLITE_DONE) {
 		dberr(file_db, "copy of xattrs may have failed");
@@ -1581,7 +1581,7 @@ pdb_check_xattrs(pseudo_msg_t *msg) {
 	}
 	int existing;
 	sqlite3_bind_int(scan, 1, msg->dev);
-	sqlite3_bind_int(scan, 2, msg->ino);
+	sqlite3_bind_int64(scan, 2, msg->ino);
 	rc = sqlite3_step(scan);
 	if (rc == SQLITE_ROW) {
 		existing = (int) sqlite3_column_int64(scan, 0);
@@ -2471,7 +2471,7 @@ pdb_get_xattr(pseudo_msg_t *msg, char **
 	}
 	pseudo_debug(PDBGF_XATTR, "requested xattr named '%s' for ino %lld\n", *value, (long long) msg->ino);
 	sqlite3_bind_int(select, 1, msg->dev);
-	sqlite3_bind_int(select, 2, msg->ino);
+	sqlite3_bind_int64(select, 2, msg->ino);
 	rc = sqlite3_bind_text(select, 3, *value, -1, SQLITE_STATIC);
 	if (rc) {
 		dberr(file_db, "couldn't bind xattr name to SELECT.");
@@ -2533,7 +2533,7 @@ pdb_list_xattr(pseudo_msg_t *msg, char *
 		}
 	}
 	sqlite3_bind_int(select, 1, msg->dev);
-	sqlite3_bind_int(select, 2, msg->ino);
+	sqlite3_bind_int64(select, 2, msg->ino);
 	do {
 		rc = sqlite3_step(select);
 		if (rc == SQLITE_ROW) {
@@ -2587,7 +2587,7 @@ pdb_remove_xattr(pseudo_msg_t *msg, char
 		}
 	}
 	sqlite3_bind_int(delete, 1, msg->dev);
-	sqlite3_bind_int(delete, 2, msg->ino);
+	sqlite3_bind_int64(delete, 2, msg->ino);
 	rc = sqlite3_bind_text(delete, 3, value, len, SQLITE_STATIC);
 	if (rc) {
 		dberr(file_db, "couldn't bind xattr name to DELETE.");
@@ -2628,7 +2628,7 @@ pdb_set_xattr(pseudo_msg_t *msg, char *v
 		}
 	}
 	sqlite3_bind_int(select, 1, msg->dev);
-	sqlite3_bind_int(select, 2, msg->ino);
+	sqlite3_bind_int64(select, 2, msg->ino);
 	rc = sqlite3_bind_text(select, 3, value, -1, SQLITE_STATIC);
 	if (rc) {
 		dberr(file_db, "couldn't bind xattr name to SELECT.");


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-20 20:41                       ` Jack.Fewx
@ 2018-09-20 20:46                         ` Seebs
  2018-09-20 20:50                         ` Seebs
  1 sibling, 0 replies; 22+ messages in thread
From: Seebs @ 2018-09-20 20:46 UTC (permalink / raw)
  To: Jack.Fewx; +Cc: yocto

On Thu, 20 Sep 2018 20:41:02 +0000
<Jack.Fewx@dell.com> wrote:

> Okay, so I found the problem, and I have a patch for it.  Throughout
> the pseudo_db.c code, the handling of the msg->ino objects is not
> consistent.  Many get passed to the sqlite3_bind_int64 function, but
> NOT ALL.  9 instances in the code, including all of them in the xattr
> code use only sqlite3_bind_int which truncates the integers to signed
> 32-bits.

D'oh. I should have noticed that since I was looking at the code.

Thanks, that's a very good fix.

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-20 20:41                       ` Jack.Fewx
  2018-09-20 20:46                         ` Seebs
@ 2018-09-20 20:50                         ` Seebs
  2018-09-21 12:50                           ` Burton, Ross
  1 sibling, 1 reply; 22+ messages in thread
From: Seebs @ 2018-09-20 20:50 UTC (permalink / raw)
  To: Jack.Fewx; +Cc: yocto

Nice catch. Merged a patch that applies this also.

-s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-20 20:50                         ` Seebs
@ 2018-09-21 12:50                           ` Burton, Ross
  2018-09-23 13:23                             ` Martin Jansa
  0 siblings, 1 reply; 22+ messages in thread
From: Burton, Ross @ 2018-09-21 12:50 UTC (permalink / raw)
  To: Seebs; +Cc: Yocto-mailing-list

Wired: I've just sent a patch to update oe-core to use the current
HEAD of pseudo.
Tired: WARNING: glibc-locale-2.28-r0 do_package_qa: QA Issue:
glibc-locale: /glibc-binary-localedata-en-za.iso-8859-1/usr/lib/locale/en_ZA.ISO-8859-1/LC_PAPER
is owned by uid 1000, which is the same as the user running bitbake.
This may be due to host contamination [host-user-contaminated]

I was *really* hoping this would solve the problem.

Ross

On Thu, 20 Sep 2018 at 21:51, Seebs <seebs@seebs.net> wrote:
>
> Nice catch. Merged a patch that applies this also.
>
> -s


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [pseudo] Pseudo 1.8+ xattr sqlite corruption
  2018-09-21 12:50                           ` Burton, Ross
@ 2018-09-23 13:23                             ` Martin Jansa
  0 siblings, 0 replies; 22+ messages in thread
From: Martin Jansa @ 2018-09-23 13:23 UTC (permalink / raw)
  To: Burton, Ross; +Cc: Seebs, Yocto Project

[-- Attachment #1: Type: text/plain, Size: 1292 bytes --]

I did the same SRCREV bump locally and can confirm that
https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434 pseudo: Incorrect
UID/GID in packaged files
is still reproducible:
ERROR: glibc-locale-2.24-r0 do_package_qa: QA Issue: glibc-locale:
/glibc-binary-localedata-sw-ke/usr/lib/locale/sw_KE/LC_NUMERIC is owned by
uid 1101, which is the same as the user running bitbake. This may be due to
host contamination [host-user-contaminated]

On Fri, Sep 21, 2018 at 2:51 PM Burton, Ross <ross.burton@intel.com> wrote:

> Wired: I've just sent a patch to update oe-core to use the current
> HEAD of pseudo.
> Tired: WARNING: glibc-locale-2.28-r0 do_package_qa: QA Issue:
> glibc-locale:
> /glibc-binary-localedata-en-za.iso-8859-1/usr/lib/locale/en_ZA.ISO-8859-1/LC_PAPER
> is owned by uid 1000, which is the same as the user running bitbake.
> This may be due to host contamination [host-user-contaminated]
>
> I was *really* hoping this would solve the problem.
>
> Ross
>
> On Thu, 20 Sep 2018 at 21:51, Seebs <seebs@seebs.net> wrote:
> >
> > Nice catch. Merged a patch that applies this also.
> >
> > -s
> --
> _______________________________________________
> yocto mailing list
> yocto@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/yocto
>

[-- Attachment #2: Type: text/html, Size: 1980 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-09-23 13:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-20 18:45 [pseudo] Pseudo 1.8+ xattr sqlite corruption Jack.Fewx
2018-08-22  8:41 ` Alexander Kanavin
2018-08-22 13:58   ` Jack.Fewx
2018-08-22 14:41     ` Joshua Watt
2018-08-22 14:54       ` Jack.Fewx
2018-08-22 15:09         ` Richard Purdie
2018-08-22 15:32           ` Jack.Fewx
2018-09-18 20:26           ` Jack.Fewx
2018-09-18 21:09             ` Seebs
2018-09-18 21:16               ` Joshua Watt
2018-09-18 21:20                 ` Seebs
2018-09-19 11:33                   ` Burton, Ross
2018-09-19 14:39                     ` Seebs
2018-09-19 16:25                       ` Jack.Fewx
2018-09-20 19:16                     ` Seebs
2018-09-20 20:41                       ` Jack.Fewx
2018-09-20 20:46                         ` Seebs
2018-09-20 20:50                         ` Seebs
2018-09-21 12:50                           ` Burton, Ross
2018-09-23 13:23                             ` Martin Jansa
2018-08-22 16:41         ` Seebs
2018-08-22 14:44     ` Alexander Kanavin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.