linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Race] data race between blkdev_ioctl() and generic_fadvise()
@ 2020-11-30 14:58 Gong, Sishuai
  2020-12-01 20:33 ` Gong, Sishuai
  0 siblings, 1 reply; 2+ messages in thread
From: Gong, Sishuai @ 2020-11-30 14:58 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-mm

Hi,

We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this is a harmful bug.

------------------------------------------
Writer site

 /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/block/ioctl.c:573
        553      case BLKPBSZGET: /* get block device physical block size */
        554          return put_uint(arg, bdev_physical_block_size(bdev));
        555      case BLKIOMIN:
        556          return put_uint(arg, bdev_io_min(bdev));
        557      case BLKIOOPT:
        558          return put_uint(arg, bdev_io_opt(bdev));
        559      case BLKALIGNOFF:
        560          return put_int(arg, bdev_alignment_offset(bdev));
        561      case BLKDISCARDZEROES:
        562          return put_uint(arg, 0);
        563      case BLKSECTGET:
        564          max_sectors = min_t(unsigned int, USHRT_MAX,
        565                      queue_max_sectors(bdev_get_queue(bdev)));
        566          return put_ushort(arg, max_sectors);
        567      case BLKROTATIONAL:
        568          return put_ushort(arg, !blk_queue_nonrot(bdev_get_queue(bdev)));
        569      case BLKRASET:
        570      case BLKFRASET:
        571          if(!capable(CAP_SYS_ADMIN))
        572              return -EACCES;
 ==>    573          bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
        574          return 0;
        575      case BLKBSZSET:
        576          return blkdev_bszset(bdev, mode, argp);
        577      case BLKPG:
        578          return blkpg_ioctl(bdev, argp);
        579      case BLKRRPART:
        580          return blkdev_reread_part(bdev);
        581      case BLKGETSIZE:
        582          size = i_size_read(bdev->bd_inode);
        583          if ((size >> 9) > ~0UL)
        584              return -EFBIG;
        585          return put_ulong(arg, size >> 9);
        586      case BLKGETSIZE64:
        587          return put_u64(arg, i_size_read(bdev->bd_inode));
        588      case BLKTRACESTART:
        589      case BLKTRACESTOP:
        590      case BLKTRACESETUP:
        591      case BLKTRACETEARDOWN:
        592          return blk_trace_ioctl(bdev, cmd, argp);
        593      case IOC_PR_REGISTER:

------------------------------------------
Reader site
/tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/mm/fadvise.c:79
         66      /*
         67       * Careful about overflows. Len == 0 means "as much as possible".  Use
         68       * unsigned math because signed overflows are undefined and UBSan
         69       * complains.
         70       */
         71      endbyte = (u64)offset + (u64)len;
         72      if (!len || endbyte < len)
         73          endbyte = -1;
         74      else
         75          endbyte--;      /* inclusive */
         76
         77      switch (advice) {
         78      case POSIX_FADV_NORMAL:
 ==>     79          file->f_ra.ra_pages = bdi->ra_pages;
         80          spin_lock(&file->f_lock);
         81          file->f_mode &= ~FMODE_RANDOM;
         82          spin_unlock(&file->f_lock);
         83          break;
         84      case POSIX_FADV_RANDOM:
         85          spin_lock(&file->f_lock);
         86          file->f_mode |= FMODE_RANDOM;
         87          spin_unlock(&file->f_lock);
         88          break;
         89      case POSIX_FADV_SEQUENTIAL:
         90          file->f_ra.ra_pages = bdi->ra_pages * 2;
         91          spin_lock(&file->f_lock);
         92          file->f_mode &= ~FMODE_RANDOM;
         93          spin_unlock(&file->f_lock);
         94          break;
         95      case POSIX_FADV_WILLNEED:
         96          /* First and last PARTIAL page! */
         97          start_index = offset >> PAGE_SHIFT;
         98          end_index = endbyte >> PAGE_SHIFT;



------------------------------------------
Writer calling trace

- ksys_ioctl
-- do_vfs_ioctl
--- vfs_ioctl
---- blkdev_ioctl

------------------------------------------
Reader calling trace
- ksys_fadvise64_64
-- vfs_fadvise
--- generic_fadvise



Thanks,
Sishuai



^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Race] data race between blkdev_ioctl() and generic_fadvise()
  2020-11-30 14:58 [Race] data race between blkdev_ioctl() and generic_fadvise() Gong, Sishuai
@ 2020-12-01 20:33 ` Gong, Sishuai
  0 siblings, 0 replies; 2+ messages in thread
From: Gong, Sishuai @ 2020-12-01 20:33 UTC (permalink / raw)
  To: axboe; +Cc: linux-block, linux-mm

We also want to provide more information about this data race and hope this could be helpful for you. 

------------------------------------------
Interleavings
With more experiments, we observed two interleavings of these two racy instructions.

Interleaving 1
Writer								Reader
								file->f_ra.ra_pages = bdi->ra_pages;
								// read value = 20
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0

Interleaving 2
bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
// write value = 0
								file->f_ra.ra_pages = bdi->ra_pages;
								// read value = 0

In both interleavings, the read value would be passed to file->f_ra.ra_pages. However, in our test input, this variable was never touched after that, so we are not sure what impact it can have afterwards.

------------------------------------------
Test input
The concurrent input behind this data race is a pair of single-thread kernel input. We attach them in Syzkaller’s format here.
 
Input-1
r0 = syz_open_dev$loop(&(0x7f0000000000)='/dev/loop#\x00', 0x0, 0x0)
ioctl$BLKFRASET(r0, 0x1264, 0x0)

Input-2
r0 = syz_open_dev$loop(&(0x7f0000000180)='/dev/loop#\x00', 0x0, 0x0)
fadvise64(r0, 0x0, 0x4130122d, 0x5)


Thanks,
Sishuai

> On Nov 30, 2020, at 9:58 AM, Gong, Sishuai <sishuai@purdue.edu> wrote:
> 
> Hi,
> 
> We found a data race in linux kernel 5.3.11 that we are able to reproduce in x86 under specific interleavings. Currently, we are not sure about the consequence of this race so we would like to confirm with the community if this is a harmful bug.
> 
> ------------------------------------------
> Writer site
> 
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/block/ioctl.c:573
>        553      case BLKPBSZGET: /* get block device physical block size */
>        554          return put_uint(arg, bdev_physical_block_size(bdev));
>        555      case BLKIOMIN:
>        556          return put_uint(arg, bdev_io_min(bdev));
>        557      case BLKIOOPT:
>        558          return put_uint(arg, bdev_io_opt(bdev));
>        559      case BLKALIGNOFF:
>        560          return put_int(arg, bdev_alignment_offset(bdev));
>        561      case BLKDISCARDZEROES:
>        562          return put_uint(arg, 0);
>        563      case BLKSECTGET:
>        564          max_sectors = min_t(unsigned int, USHRT_MAX,
>        565                      queue_max_sectors(bdev_get_queue(bdev)));
>        566          return put_ushort(arg, max_sectors);
>        567      case BLKROTATIONAL:
>        568          return put_ushort(arg, !blk_queue_nonrot(bdev_get_queue(bdev)));
>        569      case BLKRASET:
>        570      case BLKFRASET:
>        571          if(!capable(CAP_SYS_ADMIN))
>        572              return -EACCES;
> ==>    573          bdev->bd_bdi->ra_pages = (arg * 512) / PAGE_SIZE;
>        574          return 0;
>        575      case BLKBSZSET:
>        576          return blkdev_bszset(bdev, mode, argp);
>        577      case BLKPG:
>        578          return blkpg_ioctl(bdev, argp);
>        579      case BLKRRPART:
>        580          return blkdev_reread_part(bdev);
>        581      case BLKGETSIZE:
>        582          size = i_size_read(bdev->bd_inode);
>        583          if ((size >> 9) > ~0UL)
>        584              return -EFBIG;
>        585          return put_ulong(arg, size >> 9);
>        586      case BLKGETSIZE64:
>        587          return put_u64(arg, i_size_read(bdev->bd_inode));
>        588      case BLKTRACESTART:
>        589      case BLKTRACESTOP:
>        590      case BLKTRACESETUP:
>        591      case BLKTRACETEARDOWN:
>        592          return blk_trace_ioctl(bdev, cmd, argp);
>        593      case IOC_PR_REGISTER:
> 
> ------------------------------------------
> Reader site
> /tmp/tmp.B7zb7od2zE-5.3.11/extract/linux-5.3.11/mm/fadvise.c:79
>         66      /*
>         67       * Careful about overflows. Len == 0 means "as much as possible".  Use
>         68       * unsigned math because signed overflows are undefined and UBSan
>         69       * complains.
>         70       */
>         71      endbyte = (u64)offset + (u64)len;
>         72      if (!len || endbyte < len)
>         73          endbyte = -1;
>         74      else
>         75          endbyte--;      /* inclusive */
>         76
>         77      switch (advice) {
>         78      case POSIX_FADV_NORMAL:
> ==>     79          file->f_ra.ra_pages = bdi->ra_pages;
>         80          spin_lock(&file->f_lock);
>         81          file->f_mode &= ~FMODE_RANDOM;
>         82          spin_unlock(&file->f_lock);
>         83          break;
>         84      case POSIX_FADV_RANDOM:
>         85          spin_lock(&file->f_lock);
>         86          file->f_mode |= FMODE_RANDOM;
>         87          spin_unlock(&file->f_lock);
>         88          break;
>         89      case POSIX_FADV_SEQUENTIAL:
>         90          file->f_ra.ra_pages = bdi->ra_pages * 2;
>         91          spin_lock(&file->f_lock);
>         92          file->f_mode &= ~FMODE_RANDOM;
>         93          spin_unlock(&file->f_lock);
>         94          break;
>         95      case POSIX_FADV_WILLNEED:
>         96          /* First and last PARTIAL page! */
>         97          start_index = offset >> PAGE_SHIFT;
>         98          end_index = endbyte >> PAGE_SHIFT;
> 
> 
> 
> ------------------------------------------
> Writer calling trace
> 
> - ksys_ioctl
> -- do_vfs_ioctl
> --- vfs_ioctl
> ---- blkdev_ioctl
> 
> ------------------------------------------
> Reader calling trace
> - ksys_fadvise64_64
> -- vfs_fadvise
> --- generic_fadvise
> 
> 
> 
> Thanks,
> Sishuai
> 


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-12-01 20:33 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-30 14:58 [Race] data race between blkdev_ioctl() and generic_fadvise() Gong, Sishuai
2020-12-01 20:33 ` Gong, Sishuai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).