* fcntl(F_DUPFD) causing apparent file descriptor table corruption
@ 2020-05-19 21:03 Thiago Macieira
2020-05-19 21:45 ` Al Viro
0 siblings, 1 reply; 5+ messages in thread
From: Thiago Macieira @ 2020-05-19 21:03 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 2732 bytes --]
Hello Al & others
While doing something I shouldn't be doing in my code, I realised that my code
stopped responding. Turns out that after some F_DUPFD forcing the file
descriptor preposterously high, the low file descriptors become EBADF.
Something must be wrong in expand_files, but I don't have the expertise to
debug it. I'm hoping someone else will.
I've attached a testcase for it. It needs to be run as root. It's not threaded
and it reproduces the problem 100% of the time on my stable kernel (5.6.13).
This is not a security issue, since it affects only the calling process
itself.
On my machine, /proc/sys/fs/nr_open is 1073741816 and I have 32 GB of RAM (if
the problem is related to memory consumption).
The problem only occurs when growing the table.
strace shows something like:
fcntl(2, F_DUPFD, 1024) = 1024
close(1024) = 0
fcntl(2, F_DUPFD, 2048) = 2048
close(2048) = 0
fcntl(2, F_DUPFD, 4096) = 4096
close(4096) = 0
fcntl(2, F_DUPFD, 8192) = 8192
close(8192) = 0
fcntl(2, F_DUPFD, 16384) = 16384
close(16384) = 0
fcntl(2, F_DUPFD, 32768) = 32768
close(32768) = 0
fcntl(2, F_DUPFD, 65536) = 65536
close(65536) = 0
fcntl(2, F_DUPFD, 131072) = 131072
close(131072) = 0
fcntl(2, F_DUPFD, 262144) = 262144
close(262144) = 0
fcntl(2, F_DUPFD, 524288) = 524288
close(524288) = 0
fcntl(2, F_DUPFD, 1048576) = 1048576
close(1048576) = 0
fcntl(2, F_DUPFD, 2097152) = 2097152
close(2097152) = 0
fcntl(2, F_DUPFD, 4194304) = 4194304
close(4194304) = 0
fcntl(2, F_DUPFD, 8388608) = 8388608
close(8388608) = 0
fcntl(2, F_DUPFD, 16777216) = 16777216
close(16777216) = 0
fcntl(2, F_DUPFD, 33554432) = 33554432
close(33554432) = 0
fcntl(2, F_DUPFD, 67108864) = 67108864
close(67108864) = 0
fcntl(2, F_DUPFD, 134217728) = 134217728
close(134217728) = 0
fcntl(2, F_DUPFD, 536870912) = 536870912
close(536870912) = 0
write(1, "success\n", 8) = EBADF
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
[-- Attachment #2: dupfd-bug.c --]
[-- Type: text/plain, Size: 1419 bytes --]
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>
#include <unistd.h>
void run_test(int maxfd)
{
int srcfd = STDERR_FILENO;
int minfd = 1024;
/* Linux kernel expands the file descriptor table exponentially, so
* keep requesting a minimum file descriptor exponentially. */
for ( ; minfd < maxfd; minfd *= 2) {
int fd;
do {
fd = fcntl(srcfd, F_DUPFD, minfd);
} while (fd == -1 && errno == EINTR);
if (fd == -1) {
if (errno != EMFILE)
perror("fcntl");
return;
}
close(fd);
}
}
int main(int argc, char **argv)
{
struct rlimit lim;
if (argc > 1) {
lim.rlim_max = lim.rlim_cur = strtol(argv[1], NULL, 0);
} else {
int n;
FILE *f = fopen("/proc/sys/fs/nr_open","r");
if (!f) {
perror("fopen");
return EXIT_FAILURE;
}
if (fscanf(f, "%d", &n) != 1)
return EXIT_FAILURE;
fclose(f);
lim.rlim_max = lim.rlim_cur = n;
}
if (setrlimit(RLIMIT_NOFILE, &lim) == -1) {
perror("setrlimit");
return EXIT_FAILURE;
}
run_test(lim.rlim_cur);
static const char msg[] = "success\n";
int ok1 = write(STDOUT_FILENO, msg, strlen(msg)) == strlen(msg);
return ok1 ? EXIT_SUCCESS : 255;
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
2020-05-19 21:03 fcntl(F_DUPFD) causing apparent file descriptor table corruption Thiago Macieira
@ 2020-05-19 21:45 ` Al Viro
2020-05-19 22:04 ` Al Viro
2020-05-19 22:18 ` Thiago Macieira
0 siblings, 2 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 21:45 UTC (permalink / raw)
To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds
On Tue, May 19, 2020 at 02:03:03PM -0700, Thiago Macieira wrote:
> On my machine, /proc/sys/fs/nr_open is 1073741816 and I have 32 GB of RAM (if
> the problem is related to memory consumption).
>
> The problem only occurs when growing the table.
>
> strace shows something like:
> fcntl(2, F_DUPFD, 1024) = 1024
> close(1024) = 0
> fcntl(2, F_DUPFD, 2048) = 2048
> close(2048) = 0
> fcntl(2, F_DUPFD, 4096) = 4096
> close(4096) = 0
> fcntl(2, F_DUPFD, 8192) = 8192
> close(8192) = 0
> fcntl(2, F_DUPFD, 16384) = 16384
> close(16384) = 0
> fcntl(2, F_DUPFD, 32768) = 32768
> close(32768) = 0
> fcntl(2, F_DUPFD, 65536) = 65536
> close(65536) = 0
> fcntl(2, F_DUPFD, 131072) = 131072
> close(131072) = 0
> fcntl(2, F_DUPFD, 262144) = 262144
> close(262144) = 0
> fcntl(2, F_DUPFD, 524288) = 524288
> close(524288) = 0
> fcntl(2, F_DUPFD, 1048576) = 1048576
> close(1048576) = 0
> fcntl(2, F_DUPFD, 2097152) = 2097152
> close(2097152) = 0
> fcntl(2, F_DUPFD, 4194304) = 4194304
> close(4194304) = 0
> fcntl(2, F_DUPFD, 8388608) = 8388608
> close(8388608) = 0
> fcntl(2, F_DUPFD, 16777216) = 16777216
> close(16777216) = 0
> fcntl(2, F_DUPFD, 33554432) = 33554432
> close(33554432) = 0
> fcntl(2, F_DUPFD, 67108864) = 67108864
> close(67108864) = 0
> fcntl(2, F_DUPFD, 134217728) = 134217728
> close(134217728) = 0
> fcntl(2, F_DUPFD, 536870912) = 536870912
> close(536870912) = 0
> write(1, "success\n", 8) = EBADF
Er... It's going by powers of two, isn't it? So where's
268435456 between 134217728 and 536870912? And, assuming
it's a 64bit box, 536870912 pointers is 4Gb, which suggests
that we are running into 32bit overflow on calculating
some size... Let's see -
static int expand_fdtable(struct files_struct *files, unsigned int nr)
...
cur_fdt = files_fdtable(files);
BUG_ON(nr < cur_fdt->max_fds);
copy_fdtable(new_fdt, cur_fdt);
so we probably want copy_fdtable().
static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
{
unsigned int cpy, set;
BUG_ON(nfdt->max_fds < ofdt->max_fds);
cpy = ofdt->max_fds * sizeof(struct file *);
Right, here's your multiplication overflow.
set = (nfdt->max_fds - ofdt->max_fds) * sizeof(struct file *);
memcpy(nfdt->fd, ofdt->fd, cpy);
... and here's not getting the things copied. Which means that pointer
is left uninitialized and the damn thing might very well be a security
problem - you'd lucked out and ran into NULL, but had there been a pointer
to something, you would've gotten a memory corruptor.
The obvious fix would be to turn cpy and set into size_t - as in
ed fs/file.c <<'EOF'
/copy_fdtable/+2s/unsigned int/size_t/
w
q
EOF
On size_t overflow you would've failed allocation before getting to that
point - see sysctl_nr_open_max initializer. Overflow in alloc_fdtable()
(nr is unsigned int there) also can't happen, AFAICS - the worst you
can get is 1U<<31, which will fail sysctl_nr_open comparison.
I really wonder about the missing couple of syscalls in your strace, though;
could you verify that they _are_ missing and see what the fix above does to
your testcase?
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
2020-05-19 21:45 ` Al Viro
@ 2020-05-19 22:04 ` Al Viro
2020-05-19 22:18 ` Thiago Macieira
1 sibling, 0 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 22:04 UTC (permalink / raw)
To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds
On Tue, May 19, 2020 at 10:45:20PM +0100, Al Viro wrote:
> The obvious fix would be to turn cpy and set into size_t - as in
> ed fs/file.c <<'EOF'
> /copy_fdtable/+2s/unsigned int/size_t/
> w
> q
> EOF
>
> On size_t overflow you would've failed allocation before getting to that
> point - see sysctl_nr_open_max initializer. Overflow in alloc_fdtable()
> (nr is unsigned int there) also can't happen, AFAICS - the worst you
> can get is 1U<<31, which will fail sysctl_nr_open comparison.
>
> I really wonder about the missing couple of syscalls in your strace, though;
> could you verify that they _are_ missing and see what the fix above does to
> your testcase?
Anyway, whether it's all there is to your reproducers or not, the bug
is obvious; I've pushed the following into #fixes.
commit 784233a6d4a56f1d0e6e4490fbf38d3cce5742ec
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Tue May 19 17:48:52 2020 -0400
fix multiplication overflow in copy_fdtable()
cpy and set really should be size_t; we won't get an overflow on that,
since sysctl_nr_open can't be set above ~(size_t)0 / sizeof(void *),
so nr that would've managed to overflow size_t on that multiplication
won't get anywhere near copy_fdtable() - we'll fail with EMFILE
before that.
Cc: stable@kernel.org # v2.6.25+
Fixes: 9cfe015aa424 (get rid of NR_OPEN and introduce a sysctl_nr_open)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
diff --git a/fs/file.c b/fs/file.c
index c8a4e4c86e55..abb8b7081d7a 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -70,7 +70,7 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt,
*/
static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
{
- unsigned int cpy, set;
+ size_t cpy, set;
BUG_ON(nfdt->max_fds < ofdt->max_fds);
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
2020-05-19 21:45 ` Al Viro
2020-05-19 22:04 ` Al Viro
@ 2020-05-19 22:18 ` Thiago Macieira
2020-05-19 22:28 ` Al Viro
1 sibling, 1 reply; 5+ messages in thread
From: Thiago Macieira @ 2020-05-19 22:18 UTC (permalink / raw)
To: Al Viro; +Cc: linux-fsdevel, Linus Torvalds
[-- Attachment #1: Type: text/plain, Size: 2067 bytes --]
On Tuesday, 19 May 2020 14:45:20 PDT Al Viro wrote:
> ... and here's not getting the things copied. Which means that pointer
> is left uninitialized and the damn thing might very well be a security
> problem - you'd lucked out and ran into NULL, but had there been a pointer
> to something, you would've gotten a memory corruptor.
Not sure I'd call it a security issue. Only root (CAP_SYS_RESOURCE) can cause
it by raising the file descriptor limit from the defaults. The kernel still
defaults to 4096 and systemd raises it on boot to 512k, which is 3 orders of
magnitude less than what is needed to cause the issue.
> I really wonder about the missing couple of syscalls in your strace, though;
> could you verify that they _are_ missing and see what the fix above does to
> your testcase?
Looking at my terminal backtrace, I might have made a copy & paste mistake of
the trace while flipping pages. Unfortunately, the trace file I had in /tmp
was lost because I needed to reboot the machine. The other traces I have in my
terminal show:
fcntl(2, F_DUPFD, 134217728) = 134217728
close(134217728) = 0
fcntl(2, F_DUPFD, 268435456) = 268435456
close(268435456) = 0
fcntl(2, F_DUPFD, 536870912) = 536870912
close(536870912) = 0
write(1, "success\n", 8) = ?
^C^Czsh: killed sudo strace ./dupfd-bug
I had to killall -9 strace at this point. See the attached oops.
Then I insisted:
fcntl(2, F_DUPFD, 67108864) = 67108864
close(67108864) = 0
fcntl(2, F_DUPFD, 134217728) = 134217728
close(134217728) = 0
fcntl(2, F_DUPFD, 268435456Shared connection to <REDACTED> closed.
At this point, I need to drive to the office to reboot the machine. Building
the kernel and testing will take a few days.
Note to self: don't play with possible kernel bugs without a VM.
--
Thiago Macieira - thiago.macieira (AT) intel.com
Software Architect - Intel System Software Products
[-- Attachment #2: oops.txt --]
[-- Type: text/plain, Size: 4435 bytes --]
[19186.822144] alloc_fd: slot 536870912 not NULL!
[19186.822963] BUG: unable to handle page fault for address: 00000000000d73ef
[19186.823725] #PF: supervisor read access in kernel mode
[19186.824331] #PF: error_code(0x0000) - not-present page
[19186.824950] PGD 0 P4D 0
[19186.825344] Oops: 0000 [#6] SMP PTI
[19186.825813] CPU: 2 PID: 71323 Comm: dupfd-bug Tainted: G D I 5.6.13-952.native #1
[19186.826724] Hardware name: Gigabyte Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3 Pro-CF, BIOS F3 12/28/2017
[19186.827837] RIP: 0010:__fget_light+0x63/0x70
[19186.828374] Code: c0 5d c3 48 8b 4f 20 8b 01 41 39 c0 73 22 44 89 c7 48 39 c7 48 19 ff 48 8b 41 08 41 21 f8 4a 8d 04 c0 48 8b 00 48 85 c0 74 06 <23> 50 44 75 01 c3 31 c0 c3 0f 1f 40 00 55 be 00 40 00 00 48 89 e5
[19186.830091] RSP: 0018:ffffaf8281fb7ed0 EFLAGS: 00010202
[19186.830750] RAX: 00000000000d73ab RBX: 0000000000000001 RCX: ffff8ab156879300
[19186.831537] RDX: 0000000000004000 RSI: 0000000000004000 RDI: ffffffffffffffff
[19186.832326] RBP: ffffaf8281fb7ee0 R08: 0000000000000001 R09: ffff8ab0d5988000
[19186.833121] R10: ffffaf8280127e6d R11: 000000000000000c R12: ffffaf8281fb7f58
[19186.833898] R13: 00005651ef8ac034 R14: 0000000000000008 R15: 0000000000000000
[19186.834672] FS: 00007ff99749d540(0000) GS:ffff8ab69f680000(0000) knlGS:0000000000000000
[19186.835529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19186.836192] CR2: 00000000000d73ef CR3: 0000000255988006 CR4: 00000000003606e0
[19186.836967] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19186.837741] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19186.838520] Call Trace:
[19186.838920] ? __fdget_pos+0x12/0x50
[19186.839411] ksys_write+0x1a/0xd0
[19186.839877] __x64_sys_write+0x15/0x20
[19186.840381] do_syscall_64+0x55/0xf0
[19186.840866] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[19186.841467] RIP: 0033:0x7ff9973b789a
[19186.841945] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb ba 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 20 b8 01 00 00 00 c5 fc 77 0f 05 <66> 0f 1f 44 00 00 48 3d 00 f0 ff ff 77 60 c3 0f 1f 80 00 00 00 00
[19186.843655] RSP: 002b:00007ffd0f3801c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[19186.844475] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ff9973b789a
[19186.845255] RDX: 0000000000000008 RSI: 00005651ef8ac034 RDI: 0000000000000001
[19186.846041] RBP: 00005651f0ae92a0 R08: 0000000000000000 R09: 0000000000000000
[19186.846825] R10: 0000000000000000 R11: 0000000000000246 R12: 00005651ef8ab1d0
[19186.847606] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[19186.848392] Modules linked in: xt_REDIRECT iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_tables bpfilter intel_wmi_thunderbolt wmi_bmof snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nfit edac_core libnvdimm nouveau snd_hda_codec_hdmi encrypted_keys trusted tpm snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core psmouse isst_if_common snd_hwdep mxm_wmi serio_raw snd_pcm nvidiafb snd_timer vgastate snd fb_ddc e1000e soundcore mei_me i2c_i801 mei wmi atkbd libps2 i8042
[19186.852589] CR2: 00000000000d73ef
[19186.853068] ---[ end trace de4959d19d1789ea ]---
[19189.681955] RIP: 0010:__fget_files+0x32/0x70
[19189.682535] Code: d0 48 8b 57 20 89 ce 8b 02 41 39 c2 73 49 49 39 c2 48 19 c0 48 8b 52 08 44 21 c0 48 8d 04 c2 4c 8b 18 4d 85 db 74 30 44 89 c8 <41> 23 43 44 75 27 49 8b 43 38 49 8d 53 38 48 85 c0 74 0f 48 8d 0c
[19189.684289] RSP: 0018:ffffaf8281883ec0 EFLAGS: 00010206
[19189.684945] RAX: 0000000000004000 RBX: 0000000000000001 RCX: 0000000000000001
[19189.685750] RDX: ffffaf849e009000 RSI: 0000000000000001 RDI: ffff8aaff7dc1340
[19189.686559] RBP: ffffaf8281883ec8 R08: 0000000000000008 R09: 0000000000004000
[19189.687363] R10: 0000000000000008 R11: 0000000a00000000 R12: ffffaf8281883f58
[19189.688172] R13: 000000000593d323 R14: 0000000000000001 R15: 0000000000000000
[19189.688981] FS: 00007ff99749d540(0000) GS:ffff8ab69f680000(0000) knlGS:0000000000000000
[19189.689860] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19189.690556] CR2: 00000000000d73ef CR3: 0000000255988006 CR4: 00000000003606e0
[19189.691360] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[19189.692170] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[19189.693774] ------------[ cut here ]------------
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: fcntl(F_DUPFD) causing apparent file descriptor table corruption
2020-05-19 22:18 ` Thiago Macieira
@ 2020-05-19 22:28 ` Al Viro
0 siblings, 0 replies; 5+ messages in thread
From: Al Viro @ 2020-05-19 22:28 UTC (permalink / raw)
To: Thiago Macieira; +Cc: linux-fsdevel, Linus Torvalds
On Tue, May 19, 2020 at 03:18:13PM -0700, Thiago Macieira wrote:
> > I really wonder about the missing couple of syscalls in your strace, though;
> > could you verify that they _are_ missing and see what the fix above does to
> > your testcase?
>
> Looking at my terminal backtrace, I might have made a copy & paste mistake of
> the trace while flipping pages. Unfortunately, the trace file I had in /tmp
> was lost because I needed to reboot the machine. The other traces I have in my
> terminal show:
>
> fcntl(2, F_DUPFD, 134217728) = 134217728
> close(134217728) = 0
> fcntl(2, F_DUPFD, 268435456) = 268435456
> close(268435456) = 0
> fcntl(2, F_DUPFD, 536870912) = 536870912
> close(536870912) = 0
> write(1, "success\n", 8) = ?
> ^C^Czsh: killed sudo strace ./dupfd-bug
>
> I had to killall -9 strace at this point. See the attached oops.
BS values in the array of struct file pointers due to the problem above.
And very likely a memory corruption as well.
> Then I insisted:
>
> fcntl(2, F_DUPFD, 67108864) = 67108864
> close(67108864) = 0
> fcntl(2, F_DUPFD, 134217728) = 134217728
> close(134217728) = 0
> fcntl(2, F_DUPFD, 268435456Shared connection to <REDACTED> closed.
>
> At this point, I need to drive to the office to reboot the machine. Building
> the kernel and testing will take a few days.
>
> Note to self: don't play with possible kernel bugs without a VM.
... at least not without remote console, complete with ability to
powercycle the box.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-05-19 22:28 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 21:03 fcntl(F_DUPFD) causing apparent file descriptor table corruption Thiago Macieira
2020-05-19 21:45 ` Al Viro
2020-05-19 22:04 ` Al Viro
2020-05-19 22:18 ` Thiago Macieira
2020-05-19 22:28 ` Al Viro
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.