* [Race] data race between blkdev_ioctl() and generic_fadvise()
@ 2020-11-30 14:58 Gong, Sishuai
2020-12-01 20:33 ` Gong, Sishuai
0 siblings, 1 reply; 2+ messages in thread
From: Gong, Sishuai @ 2020-11-30 14:58 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-mm
Hi,
We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this is a harmful bug.
------------------------------------------
Writer site
/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/block/ioctl.c:573
553 case BLKPBSZGET: /* get block device physical block size */
554 return put_uint(arg, bdev_physical_block_size(bdev));
555 case BLKIOMIN:
556 return put_uint(arg, bdev_io_min(bdev));
557 case BLKIOOPT:
558 return put_uint(arg, bdev_io_opt(bdev));
559 case BLKALIGNOFF:
560 return put_int(arg, bdev_alignment_offset(bdev));
561 case BLKDISCARDZEROES:
562 return put_uint(arg, 0);
563 case BLKSECTGET:
564 max_sectors = min_t(unsigned int, USHRT_MAX,
565 queue_max_sectors(bdev_get_queue(bdev)));
566 return put_ushort(arg, max_sectors);
567 case BLKROTATIONAL:
568 return put_ushort(arg, !blk_queue_nonrot(bdev_get_queue(bdev)));
569 case BLKRASET:
570 case BLKFRASET:
571 if(!capable(CAP_SYS_ADMIN))
572 return -EACCES;
==> 573 bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
574 return 0;
575 case BLKBSZSET:
576 return blkdev_bszset(bdev, mode, argp);
577 case BLKPG:
578 return blkpg_ioctl(bdev, argp);
579 case BLKRRPART:
580 return blkdev_reread_part(bdev);
581 case BLKGETSIZE:
582 size = i_size_read(bdev->bd_inode);
583 if ((size >> 9) > ~0UL)
584 return -EFBIG;
585 return put_ulong(arg, size >> 9);
586 case BLKGETSIZE64:
587 return put_u64(arg, i_size_read(bdev->bd_inode));
588 case BLKTRACESTART:
589 case BLKTRACESTOP:
590 case BLKTRACESETUP:
591 case BLKTRACETEARDOWN:
592 return blk_trace_ioctl(bdev, cmd, argp);
593 case IOC_PR_REGISTER:
------------------------------------------
Reader site
/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/mm/fadvise.c:79
66 /*
67 * Careful about overflows. Len == 0 means "as much as possible". Use
68 * unsigned math because signed overflows are undefined and UBSan
69 * complains.
70 */
71 endbyte = (u64)offset + (u64)len;
72 if (!len || endbyte < len)
73 endbyte = -1;
74 else
75 endbyte--; /* inclusive */
76
77 switch (advice) {
78 case POSIX_FADV_NORMAL:
==> 79 file->f_ra.ra_pages = bdi->ra_pages;
80 spin_lock(&file->f_lock);
81 file->f_mode &= ~FMODE_RANDOM;
82 spin_unlock(&file->f_lock);
83 break;
84 case POSIX_FADV_RANDOM:
85 spin_lock(&file->f_lock);
86 file->f_mode |= FMODE_RANDOM;
87 spin_unlock(&file->f_lock);
88 break;
89 case POSIX_FADV_SEQUENTIAL:
90 file->f_ra.ra_pages = bdi->ra_pages * 2;
91 spin_lock(&file->f_lock);
92 file->f_mode &= ~FMODE_RANDOM;
93 spin_unlock(&file->f_lock);
94 break;
95 case POSIX_FADV_WILLNEED:
96 /* First and last PARTIAL page! */
97 start_index = offset >> PAGE_SHIFT;
98 end_index = endbyte >> PAGE_SHIFT;
------------------------------------------
Writer calling trace
- ksys_ioctl
-- do_vfs_ioctl
--- vfs_ioctl
---- blkdev_ioctl
------------------------------------------
Reader calling trace
- ksys_fadvise64_64
-- vfs_fadvise
--- generic_fadvise
Thanks,
Sishuai
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Race] data race between blkdev_ioctl() and generic_fadvise()
2020-11-30 14:58 [Race] data race between blkdev_ioctl() and generic_fadvise() Gong, Sishuai
@ 2020-12-01 20:33 ` Gong, Sishuai
0 siblings, 0 replies; 2+ messages in thread
From: Gong, Sishuai @ 2020-12-01 20:33 UTC (permalink / raw)
To: axboe; +Cc: linux-block, linux-mm
We also want to provide more information about this data race and hope this could be helpful for you.
------------------------------------------
Interleavings
With more experiments, we observed two interleavings of these two racy instructions.
Interleaving 1
Writer Reader
file->f_ra.ra_pages = bdi->ra_pages;
// read value = 20
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0
Interleaving 2
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0
file->f_ra.ra_pages = bdi->ra_pages;
// read value = 0
In both interleavings, the read value would be passed to file->f_ra.ra_pages. However, in our test input, this variable was never touched after that, so we are not sure what impact it can have afterwards.
------------------------------------------
Test input
The concurrent input behind this data race is a pair of single-thread kernel input. We attach them in Syzkaller’s format here.
Input-1
r0 = syz_open_dev$loop(&(0x7f0000000000)='/dev/loop#\x00', 0x0, 0x0)
ioctl$BLKFRASET(r0, 0x1264, 0x0)
Input-2
r0 = syz_open_dev$loop(&(0x7f0000000180)='/dev/loop#\x00', 0x0, 0x0)
fadvise64(r0, 0x0, 0x4130122d, 0x5)
Thanks,
Sishuai
> On Nov 30, 2020, at 9:58 AM, Gong, Sishuai <sishuai@purdue.edu> wrote:
>
> Hi,
>
> We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this is a harmful bug.
>
> ------------------------------------------
> Writer site
>
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/block/ioctl.c:573
> 553 case BLKPBSZGET: /* get block device physical block size */
> 554 return put_uint(arg, bdev_physical_block_size(bdev));
> 555 case BLKIOMIN:
> 556 return put_uint(arg, bdev_io_min(bdev));
> 557 case BLKIOOPT:
> 558 return put_uint(arg, bdev_io_opt(bdev));
> 559 case BLKALIGNOFF:
> 560 return put_int(arg, bdev_alignment_offset(bdev));
> 561 case BLKDISCARDZEROES:
> 562 return put_uint(arg, 0);
> 563 case BLKSECTGET:
> 564 max_sectors = min_t(unsigned int, USHRT_MAX,
> 565 queue_max_sectors(bdev_get_queue(bdev)));
> 566 return put_ushort(arg, max_sectors);
> 567 case BLKROTATIONAL:
> 568 return put_ushort(arg, !blk_queue_nonrot(bdev_get_queue(bdev)));
> 569 case BLKRASET:
> 570 case BLKFRASET:
> 571 if(!capable(CAP_SYS_ADMIN))
> 572 return -EACCES;
> ==> 573 bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
> 574 return 0;
> 575 case BLKBSZSET:
> 576 return blkdev_bszset(bdev, mode, argp);
> 577 case BLKPG:
> 578 return blkpg_ioctl(bdev, argp);
> 579 case BLKRRPART:
> 580 return blkdev_reread_part(bdev);
> 581 case BLKGETSIZE:
> 582 size = i_size_read(bdev->bd_inode);
> 583 if ((size >> 9) > ~0UL)
> 584 return -EFBIG;
> 585 return put_ulong(arg, size >> 9);
> 586 case BLKGETSIZE64:
> 587 return put_u64(arg, i_size_read(bdev->bd_inode));
> 588 case BLKTRACESTART:
> 589 case BLKTRACESTOP:
> 590 case BLKTRACESETUP:
> 591 case BLKTRACETEARDOWN:
> 592 return blk_trace_ioctl(bdev, cmd, argp);
> 593 case IOC_PR_REGISTER:
>
> ------------------------------------------
> Reader site
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/mm/fadvise.c:79
> 66 /*
> 67 * Careful about overflows. Len == 0 means "as much as possible". Use
> 68 * unsigned math because signed overflows are undefined and UBSan
> 69 * complains.
> 70 */
> 71 endbyte = (u64)offset + (u64)len;
> 72 if (!len || endbyte < len)
> 73 endbyte = -1;
> 74 else
> 75 endbyte--; /* inclusive */
> 76
> 77 switch (advice) {
> 78 case POSIX_FADV_NORMAL:
> ==> 79 file->f_ra.ra_pages = bdi->ra_pages;
> 80 spin_lock(&file->f_lock);
> 81 file->f_mode &= ~FMODE_RANDOM;
> 82 spin_unlock(&file->f_lock);
> 83 break;
> 84 case POSIX_FADV_RANDOM:
> 85 spin_lock(&file->f_lock);
> 86 file->f_mode |= FMODE_RANDOM;
> 87 spin_unlock(&file->f_lock);
> 88 break;
> 89 case POSIX_FADV_SEQUENTIAL:
> 90 file->f_ra.ra_pages = bdi->ra_pages * 2;
> 91 spin_lock(&file->f_lock);
> 92 file->f_mode &= ~FMODE_RANDOM;
> 93 spin_unlock(&file->f_lock);
> 94 break;
> 95 case POSIX_FADV_WILLNEED:
> 96 /* First and last PARTIAL page! */
> 97 start_index = offset >> PAGE_SHIFT;
> 98 end_index = endbyte >> PAGE_SHIFT;
>
>
>
> ------------------------------------------
> Writer calling trace
>
> - ksys_ioctl
> -- do_vfs_ioctl
> --- vfs_ioctl
> ---- blkdev_ioctl
>
> ------------------------------------------
> Reader calling trace
> - ksys_fadvise64_64
> -- vfs_fadvise
> --- generic_fadvise
>
>
>
> Thanks,
> Sishuai
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-12-01 20:33 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 14:58 [Race] data race between blkdev_ioctl() and generic_fadvise() Gong, Sishuai
2020-12-01 20:33 ` Gong, Sishuai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).