All of lore.kernel.org
 help / color / mirror / Atom feed
* OOPS: linux-4.9.33/fs/pnode.c:propagate_one()
@ 2017-09-28 13:03 Philipp Hahn
  2017-09-28 17:00 ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Philipp Hahn @ 2017-09-28 13:03 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel, linux-kernel, Felix Botner, Dirk Wiesenthal

Hello Eric,

we have observed the following OOPS with linux-4.9.33 on several
occasions when starting a Docker container:

> [  531.801537] Oops: 0000 [#1] SMP
> [  531.801565] Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc joydev pcspkr serio_raw virtio_balloon i2c_piix4 evdev quota_v2 quota_tree parport_pc ppdev lp parport autofs4 ext4 crc16 jbd2 fscrypto mbcache dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod hid_generic usbhid hid ata_generic cirrus virtio_net virtio_blk ttm ata_piix libata uhci_hcd ehci_hcd psmouseSep 28 13:30:33 master70 kernel: [  531.802171]  drm_kms_helper scsi_mod floppy usbcore virtio_pci button drm virtio_ring virtio usb_common
> [  531.802247] CPU: 0 PID: 1668 Comm: dockerd Not tainted 4.9.0-ucs104-amd64 #1 Debian 4.9.30-2A~4.2.0.201706171152
> [  531.802310] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [  531.802347] task: ffff918af7ccaf40 task.stack: ffffb47a40b54000
> [  531.802385] RIP: 0010:[<ffffffff91234020>]  [<ffffffff91234020>] propagate_one+0x160/0x1e0
> [  531.802444] RSP: 0018:ffffb47a40b57e08  EFLAGS: 00010282
> [  531.802479] RAX: 0000000000000001 RBX: ffff918adf43cd80 RCX: 0000000000000000
> [  531.802524] RDX: ffff918ad69209c0 RSI: ffff918adf43cd80 RDI: ffff918ad69209c0
> [  531.802569] RBP: ffff918afb3a2480 R08: 0000000000000001 R09: ffff918ac0183e40
> [  531.802614] R10: 00000000000011b8 R11: 00000000000176b9 R12: ffffb47a40b57e58
> [  531.802670] R13: 0000000000000000 R14: ffff918adf43cd80 R15: ffff918afbe83600
> [  531.802716] FS:  00007f6157fff700(0000) GS:ffff918affc00000(0000) knlGS:0000000000000000
> [  531.802767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  531.802804] CR2: 0000000000000010 CR3: 0000000039e60000 CR4: 00000000000006f0
> [  531.802852] Stack:
> [  531.802868]  ffff918adf43cd80 ffff918afb3a2480 ffffffff912344af ffff918ac0183e40
> [  531.802923]  ffff918afbe83600 ffff918afb3a2480 0000000000000000 0000000000000000
> [  531.802978]  ffffffff91226971 ffff918afe084900 ffff918ac01c4c40 3aa437d5d9ae5224
> [  531.803037] Call Trace:
> [  531.803062]  [<ffffffff912344af>] ? propagate_mnt+0x11f/0x150
> [  531.803104]  [<ffffffff91226971>] ? attach_recursive_mnt+0x1d1/0x2f0
> [  531.803146]  [<ffffffff912279cd>] ? do_mount+0xafd/0xc70
> [  531.803182]  [<ffffffff91227e24>] ? SyS_mount+0x84/0xc0
> [  531.803229]  [<ffffffff91611abb>] ? system_call_fast_compare_end+0xc/0x9b
> [  531.803273] Code: 48 89 de eb 09 f6 42 33 04 75 7d 48 89 d6 48 8b 96 d8 00 00 00 48 39 fa 75 eb 48 8b 0d 52 d6 cc 00 4c 8b 0d 53 d6 cc 00 49 39 c9 <48> 8b 51 10 74 5c 48 3b ba d8 00 00 00 75 3f 45 84 c0 75 55 8b
> [  531.805256] RIP  [<ffffffff91234020>] propagate_one+0x160/0x1e0
> [  531.807057]  RSP <ffffb47a40b57e08>
> [  531.808839] CR2: 0000000000000010
> [  531.810618] ---[ end trace f817db65f78e36fc ]---

The same has happened in Amazon-EC2 and on out own KVM servers.

There was a similar bug report in 2016 at
<https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1572316>, but that
one got fixed with "v4.6-rc7~24^2".

Maybe is has already been fixed by
> $ git l1 v4.9.33..v4.9.52 -- fs/pnode.c
> 54fcb2303ef4 mnt: Make propagate_umount less slow for overlapping mount propagation trees
> bb4fbf094b44 mnt: In propgate_umount handle visiting mounts in any order
> e260db757676 mnt: In umount propagation reparent in a separate pass

but I'd like to ask anyway if you have seen such a OOPs before?

For your convenience here's the disassembly, where I think "rcx =
last_source->mnt_parent = NULL":
> 		do {
> 			struct mount *parent = last_source->mnt_parent;
> 			if (last_source == first_source)
> 000000000000029d <propagate_one+0x15d> cmp    %rcx,%r9
> 			p = n->mnt_master;
> 			if (p == dest_master || IS_MNT_MARKED(p))
> 				break;
> 		}
> 		do {
> 			struct mount *parent = last_source->mnt_parent;
> 00000000000002a0 <propagate_one+0x160> mov    0x10(%rcx),%rdx
> 			if (last_source == first_source)
> 00000000000002a4 <propagate_one+0x164> je     0000000000000302 <propagate_one+0x1c2>
> 				break;
> 			done = parent->mnt_master == p;
> 			if (done && peers(n, parent))
> 00000000000002a6 <propagate_one+0x166> cmp    0xd8(%rdx),%rdi
> 00000000000002ad <propagate_one+0x16d> jne    00000000000002ee <propagate_one+0x1ae>
> 00000000000002af <propagate_one+0x16f> test   %r8b,%r8b
> 00000000000002b2 <propagate_one+0x172> jne    0000000000000309 <propagate_one+0x1c9>
> 00000000000002b4 <propagate_one+0x174> mov    0x110(%rsi),%eax

We will try 4.9.52 next and see, if we can still reproduce it.

Philipp
-- 
Philipp Hahn
Open Source Software Engineer

Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
Tel.: +49 421 22232-0
Fax : +49 421 22232-99
hahn@univention.de

http://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOPS: linux-4.9.33/fs/pnode.c:propagate_one()
  2017-09-28 13:03 OOPS: linux-4.9.33/fs/pnode.c:propagate_one() Philipp Hahn
@ 2017-09-28 17:00 ` Eric W. Biederman
  2017-10-03 16:48   ` Eric W. Biederman
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2017-09-28 17:00 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: linux-fsdevel, linux-kernel, Felix Botner, Dirk Wiesenthal

Philipp Hahn <hahn@univention.de> writes:

> Hello Eric,
>
> we have observed the following OOPS with linux-4.9.33 on several
> occasions when starting a Docker container:
>
>> [  531.801537] Oops: 0000 [#1] SMP
>> [  531.801565] Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 xfrm_user xfrm_algo xt_addrtype br_netfilter bridge stp llc overlay ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables xt_conntrack iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc joydev pcspkr serio_raw virtio_balloon i2c_piix4 evdev quota_v2 quota_tree parport_pc ppdev lp parport autofs4 ext4 crc16 jbd2 fscrypto mbcache dm_snapshot dm_bufio dm_mirror dm_region_hash dm_log dm_mod hid_generic usbhid hid ata_generic cirrus virtio_net virtio_blk ttm ata_piix libata uhci_hcd ehci_hcd psmouseSep 28 13:30:33 master70 kernel: [  531.802171]  drm_kms_helper scsi_mod floppy usbcore virtio_pci button drm virtio_ring virtio usb_common
>> [  531.802247] CPU: 0 PID: 1668 Comm: dockerd Not tainted 4.9.0-ucs104-amd64 #1 Debian 4.9.30-2A~4.2.0.201706171152
>> [  531.802310] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
>> [  531.802347] task: ffff918af7ccaf40 task.stack: ffffb47a40b54000
>> [  531.802385] RIP: 0010:[<ffffffff91234020>]  [<ffffffff91234020>] propagate_one+0x160/0x1e0
>> [  531.802444] RSP: 0018:ffffb47a40b57e08  EFLAGS: 00010282
>> [  531.802479] RAX: 0000000000000001 RBX: ffff918adf43cd80 RCX: 0000000000000000
>> [  531.802524] RDX: ffff918ad69209c0 RSI: ffff918adf43cd80 RDI: ffff918ad69209c0
>> [  531.802569] RBP: ffff918afb3a2480 R08: 0000000000000001 R09: ffff918ac0183e40
>> [  531.802614] R10: 00000000000011b8 R11: 00000000000176b9 R12: ffffb47a40b57e58
>> [  531.802670] R13: 0000000000000000 R14: ffff918adf43cd80 R15: ffff918afbe83600
>> [  531.802716] FS:  00007f6157fff700(0000) GS:ffff918affc00000(0000) knlGS:0000000000000000
>> [  531.802767] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [  531.802804] CR2: 0000000000000010 CR3: 0000000039e60000 CR4: 00000000000006f0
>> [  531.802852] Stack:
>> [  531.802868]  ffff918adf43cd80 ffff918afb3a2480 ffffffff912344af ffff918ac0183e40
>> [  531.802923]  ffff918afbe83600 ffff918afb3a2480 0000000000000000 0000000000000000
>> [  531.802978]  ffffffff91226971 ffff918afe084900 ffff918ac01c4c40 3aa437d5d9ae5224
>> [  531.803037] Call Trace:
>> [  531.803062]  [<ffffffff912344af>] ? propagate_mnt+0x11f/0x150
>> [  531.803104]  [<ffffffff91226971>] ? attach_recursive_mnt+0x1d1/0x2f0
>> [  531.803146]  [<ffffffff912279cd>] ? do_mount+0xafd/0xc70
>> [  531.803182]  [<ffffffff91227e24>] ? SyS_mount+0x84/0xc0
>> [  531.803229]  [<ffffffff91611abb>] ? system_call_fast_compare_end+0xc/0x9b
>> [  531.803273] Code: 48 89 de eb 09 f6 42 33 04 75 7d 48 89 d6 48 8b 96 d8 00 00 00 48 39 fa 75 eb 48 8b 0d 52 d6 cc 00 4c 8b 0d 53 d6 cc 00 49 39 c9 <48> 8b 51 10 74 5c 48 3b ba d8 00 00 00 75 3f 45 84 c0 75 55 8b
>> [  531.805256] RIP  [<ffffffff91234020>] propagate_one+0x160/0x1e0
>> [  531.807057]  RSP <ffffb47a40b57e08>
>> [  531.808839] CR2: 0000000000000010
>> [  531.810618] ---[ end trace f817db65f78e36fc ]---
>
> The same has happened in Amazon-EC2 and on out own KVM servers.
>
> There was a similar bug report in 2016 at
> <https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1572316>, but that
> one got fixed with "v4.6-rc7~24^2".
>
> Maybe is has already been fixed by
>> $ git l1 v4.9.33..v4.9.52 -- fs/pnode.c
>> 54fcb2303ef4 mnt: Make propagate_umount less slow for overlapping mount propagation trees
>> bb4fbf094b44 mnt: In propgate_umount handle visiting mounts in any order
>> e260db757676 mnt: In umount propagation reparent in a separate pass
>
> but I'd like to ask anyway if you have seen such a OOPs before?

I have not.

> For your convenience here's the disassembly, where I think "rcx =
> last_source->mnt_parent = NULL":

If your disassembly is lined up properly with the code my guess would
be that rcx simply holds the value of last_source.  And rdx
is being set to last_source->mnt_parent.

Disassembly a recent build on my machine I have a one byte difference
from yours, but that is what I see happening. rcx == last_source.

Why and how last_source is getting set to NULL is something I need to
read the code for a while to figure out.

>> 		do {
>> 			struct mount *parent = last_source->mnt_parent;
>> 			if (last_source == first_source)
>> 000000000000029d <propagate_one+0x15d> cmp    %rcx,%r9
>> 			p = n->mnt_master;
>> 			if (p == dest_master || IS_MNT_MARKED(p))
>> 				break;
>> 		}
>> 		do {
>> 			struct mount *parent = last_source->mnt_parent;
>> 00000000000002a0 <propagate_one+0x160> mov    0x10(%rcx),%rdx
>> 			if (last_source == first_source)
>> 00000000000002a4 <propagate_one+0x164> je     0000000000000302 <propagate_one+0x1c2>
>> 				break;
>> 			done = parent->mnt_master == p;
>> 			if (done && peers(n, parent))
>> 00000000000002a6 <propagate_one+0x166> cmp    0xd8(%rdx),%rdi
>> 00000000000002ad <propagate_one+0x16d> jne    00000000000002ee <propagate_one+0x1ae>
>> 00000000000002af <propagate_one+0x16f> test   %r8b,%r8b
>> 00000000000002b2 <propagate_one+0x172> jne    0000000000000309 <propagate_one+0x1c9>
>> 00000000000002b4 <propagate_one+0x174> mov    0x110(%rsi),%eax
>
> We will try 4.9.52 next and see, if we can still reproduce it.

I suspect you will be able to as nothing as nothing substantial has
changed in that logic in a while.

I will stare at it and see if I can imagine how you are getting
last_source set to NULL.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOPS: linux-4.9.33/fs/pnode.c:propagate_one()
  2017-09-28 17:00 ` Eric W. Biederman
@ 2017-10-03 16:48   ` Eric W. Biederman
  2017-10-04 13:18     ` Philipp Hahn
  0 siblings, 1 reply; 4+ messages in thread
From: Eric W. Biederman @ 2017-10-03 16:48 UTC (permalink / raw)
  To: Philipp Hahn; +Cc: linux-fsdevel, linux-kernel, Felix Botner, Dirk Wiesenthal

ebiederm@xmission.com (Eric W. Biederman) writes:

> Philipp Hahn <hahn@univention.de> writes:
>
>> We will try 4.9.52 next and see, if we can still reproduce it.

Can you still reproduce this on 4.9.52?

I am not seeing anything obvious that would result in last_source
becomming NULL there so knowing I am not lookning for a phantom would be very
helpful.

Eric

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: OOPS: linux-4.9.33/fs/pnode.c:propagate_one()
  2017-10-03 16:48   ` Eric W. Biederman
@ 2017-10-04 13:18     ` Philipp Hahn
  0 siblings, 0 replies; 4+ messages in thread
From: Philipp Hahn @ 2017-10-04 13:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-fsdevel, linux-kernel, Felix Botner, Dirk Wiesenthal

Hello Eric,

Am 03.10.2017 um 18:48 schrieb Eric W. Biederman:
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
>> Philipp Hahn <hahn@univention.de> writes:
>>
>>> We will try 4.9.52 next and see, if we can still reproduce it.
> 
> Can you still reproduce this on 4.9.52?

My college Dirk is able to reproduce it only with .33, but not with .52
(yet).

> I am not seeing anything obvious that would result in last_source
> becomming NULL there so knowing I am not lookning for a phantom would be very
> helpful.

Looks like .52 closed the race windows sufficiently enough, so the bug
does not get triggered there.

The bug is triggered when using "docker cp" to copy a file into the
container. It communicates with the long-running 'dockerd', which
rebuilds the overlay FS itself to access the file. The failing syscall is:
> mount("/fw/data_folder", "/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/fwxserver/Data Folder", 0xc82135e5c8, MS_BIND|MS_REC, NULL

I just now noticed that before the mount() docker did this:
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/media", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/tmp", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/etc", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/log", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/var/lib/univention-appcenter/apps/filewave/data", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/var/lib/univention-appcenter/apps/filewave/conf", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/certs", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/fwcld", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/fwxserver/Data Folder", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/apache/conf", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/fwxserver/DB", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/apache/logs", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/private/var/log", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/apache/passwords", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/sys/fs/cgroup", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/ipa", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/etc/apt/apt.conf.d/80proxy", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/usr/local/filewave/postgresql/conf", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/etc/resolv.conf", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/etc/hostname", MNT_DETACH) = 0
> umount("/var/lib/docker/overlay/038e54bc5db6a180409aa0e5b0f84087b93ac7ffe4eaa3c8908aea79d43a45d3/merged/etc/hosts", MNT_DETACH) = 0

Is MNT_DETACH a problem, as it is described to do a "lazy" umount? Doing
another
> strace -ttt -b execve -f -ff -s 65536 -o /var/tmp/docker.strace -p $(</run/docker.pid)
shows the mount() following the umounts() < 0.2s, so maybe that's the
missing detail.


If you need more data, just ask Dirk: he's able to re-produce the bug
every time.

If you need more data from the kernel, I can build a customized kernel
with more debug output if that helps.

Philipp

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-10-04 13:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-28 13:03 OOPS: linux-4.9.33/fs/pnode.c:propagate_one() Philipp Hahn
2017-09-28 17:00 ` Eric W. Biederman
2017-10-03 16:48   ` Eric W. Biederman
2017-10-04 13:18     ` Philipp Hahn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.