* null_handle_cmd() doesn't initialize data when reading @ 2019-11-15 10:16 Alexander Potapenko 2019-11-20 23:12 ` Jens Axboe 0 siblings, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2019-11-15 10:16 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Willem de Bruijn, Dmitriy Vyukov Hi Jens, I'm debugging an issue in nullb driver reported by KMSAN at QEMU startup. There are numerous reports like the one below when checking nullb for different partition types. Basically, read_dev_sector() allocates a cache page which is then wrapped into a bio and passed to the device driver, but never initialized. I've tracked the problem down to a call to null_handle_cmd(cmd, /*sector*/0, /*nr_sectors*/8, /*op*/0). Turns out all the if-branches in this function are skipped, so neither of null_handle_throttled(), null_handle_flush(), null_handle_badblocks(), null_handle_memory_backed(), null_handle_zoned() is executed, and we proceed directly to nullb_complete_cmd(). As a result, the pages read from the nullb device are never initialized, at least at boot time. How can we fix this? This bug may also have something to do with https://groups.google.com/d/topic/syzkaller-bugs/d0fmiL9Vi9k/discussion. KMSAN report follows: ===================================================== BUG: KMSAN: uninit-value in[< none >] adfspart_check_ICS+0xd08/0x1040 block/partitions/acorn.c:365 Call Trace: [< inline >] __dump_stack lib/dump_stack.c:77 [< none >] dump_stack+0x196/0x1f0 lib/dump_stack.c:113 [< none >] kmsan_report+0x127/0x220 mm/kmsan/kmsan_report.c:108 [< none >] __msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:245 [< none >] adfspart_check_ICS+0xd08/0x1040 block/partitions/acorn.c:365 [< none >] check_partition+0x58c/0xc20 block/partitions/check.c:167 [< none >] rescan_partitions+0x39b/0x1ff0 block/partition-generic.c:531 [< none >] __blkdev_get+0x14f1/0x2440 fs/block_dev.c:1600 [< none >] blkdev_get+0x237/0x6a0 fs/block_dev.c:1708 [< inline >] register_disk block/genhd.c:655 [< none >] __device_add_disk+0x1612/0x20f0 block/genhd.c:745 [< none >] device_add_disk+0x90/0xa0 block/genhd.c:763 [< inline >] add_disk ./include/linux/genhd.h:429 [< inline >] null_gendisk_register drivers/block/null_blk_main.c:1547 [< none >] null_add_dev+0x34c7/0x3b30 drivers/block/null_blk_main.c:1718 ... Uninit was created at: [< none >] kmsan_save_stack_with_flags+0x3f/0x90 mm/kmsan/kmsan.c:151 [< inline >] kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:362 [< none >] kmsan_alloc_page+0x14e/0x360 mm/kmsan/kmsan_shadow.c:391 [< none >] __alloc_pages_nodemask+0x594e/0x6050 mm/page_alloc.c:4796 [< inline >] __alloc_pages ./include/linux/gfp.h:475 [< inline >] alloc_page_interleave mm/mempolicy.c:2058 [< none >] alloc_pages_current+0x2e7/0x990 mm/mempolicy.c:2186 [< inline >] alloc_pages ./include/linux/gfp.h:511 [< none >] __page_cache_alloc+0x95/0x310 mm/filemap.c:981 [< none >] do_read_cache_page+0x4d5/0x1520 mm/filemap.c:2788 [< none >] read_cache_page+0xf3/0x110 mm/filemap.c:2896 [< inline >] read_mapping_page ./include/linux/pagemap.h:396 [< none >] read_dev_sector+0xd6/0x390 block/partition-generic.c:668 [< inline >] read_part_sector block/partitions/check.h:38 [< none >] adfspart_check_ICS+0x117/0x1040 block/partitions/acorn.c:361 [< none >] check_partition+0x58c/0xc20 block/partitions/check.c:167 [< none >] rescan_partitions+0x39b/0x1ff0 block/partition-generic.c:531 [< none >] __blkdev_get+0x14f1/0x2440 fs/block_dev.c:1600 [< none >] blkdev_get+0x237/0x6a0 fs/block_dev.c:1708 [< inline >] register_disk block/genhd.c:655 ========================================== Thanks, Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2019-11-15 10:16 null_handle_cmd() doesn't initialize data when reading Alexander Potapenko @ 2019-11-20 23:12 ` Jens Axboe 2019-11-22 11:58 ` Alexander Potapenko 0 siblings, 1 reply; 14+ messages in thread From: Jens Axboe @ 2019-11-20 23:12 UTC (permalink / raw) To: Alexander Potapenko; +Cc: linux-block, Willem de Bruijn, Dmitriy Vyukov On 11/15/19 3:16 AM, Alexander Potapenko wrote: > Hi Jens, > > I'm debugging an issue in nullb driver reported by KMSAN at QEMU startup. > There are numerous reports like the one below when checking nullb for > different partition types. > Basically, read_dev_sector() allocates a cache page which is then > wrapped into a bio and passed to the device driver, but never > initialized. > > I've tracked the problem down to a call to null_handle_cmd(cmd, > /*sector*/0, /*nr_sectors*/8, /*op*/0). > Turns out all the if-branches in this function are skipped, so neither > of null_handle_throttled(), null_handle_flush(), > null_handle_badblocks(), null_handle_memory_backed(), > null_handle_zoned() is executed, and we proceed directly to > nullb_complete_cmd(). > > As a result, the pages read from the nullb device are never > initialized, at least at boot time. > How can we fix this? > > This bug may also have something to do with > https://groups.google.com/d/topic/syzkaller-bugs/d0fmiL9Vi9k/discussion. Probably just want to have the read path actually memset() them to zero, or something like that. -- Jens Axboe ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2019-11-20 23:12 ` Jens Axboe @ 2019-11-22 11:58 ` Alexander Potapenko 2019-11-25 4:01 ` Chaitanya Kulkarni 0 siblings, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2019-11-22 11:58 UTC (permalink / raw) To: Jens Axboe; +Cc: linux-block, Willem de Bruijn, Dmitriy Vyukov On Thu, Nov 21, 2019 at 3:00 AM Jens Axboe <axboe@kernel.dk> wrote: > > On 11/15/19 3:16 AM, Alexander Potapenko wrote: > > Hi Jens, > > > > I'm debugging an issue in nullb driver reported by KMSAN at QEMU startup. > > There are numerous reports like the one below when checking nullb for > > different partition types. > > Basically, read_dev_sector() allocates a cache page which is then > > wrapped into a bio and passed to the device driver, but never > > initialized. > > > > I've tracked the problem down to a call to null_handle_cmd(cmd, > > /*sector*/0, /*nr_sectors*/8, /*op*/0). > > Turns out all the if-branches in this function are skipped, so neither > > of null_handle_throttled(), null_handle_flush(), > > null_handle_badblocks(), null_handle_memory_backed(), > > null_handle_zoned() is executed, and we proceed directly to > > nullb_complete_cmd(). > > > > As a result, the pages read from the nullb device are never > > initialized, at least at boot time. > > How can we fix this? > > > > This bug may also have something to do with > > https://groups.google.com/d/topic/syzkaller-bugs/d0fmiL9Vi9k/discussion. > > Probably just want to have the read path actually memset() them to > zero, or something like that. I'm using the following patch in KMSAN tree to suppress these bugs: diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 0e7da5015ccd5..9e8e99bdc0db3 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1235,8 +1235,16 @@ static blk_status_t null_handle_cmd(struct nullb_cmd *cmd, sector_t sector, if (dev->memory_backed) cmd->error = null_handle_memory_backed(cmd, op); - if (!cmd->error && dev->zoned) + if (!cmd->error && dev->zoned) { cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); + } else if (dev->queue_mode != NULL_Q_BIO) { + /* + * TODO(glider): this is a hack to suppress bugs in nullb + * driver. I have no idea what I'm doing, better wait for a + * proper fix from Jens Axboe. + */ + cmd->error = null_handle_rq(cmd); + } It appears to fix the problem, but I'm not really sure this is the right thing to do (e.g. whether it covers all invocations of null_handle_cmd(), probably not) I don't see an easy way to memset() the pages being read, as there're no pages at this point. > -- > Jens Axboe > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2019-11-22 11:58 ` Alexander Potapenko @ 2019-11-25 4:01 ` Chaitanya Kulkarni 2020-05-10 10:03 ` Alexander Potapenko 0 siblings, 1 reply; 14+ messages in thread From: Chaitanya Kulkarni @ 2019-11-25 4:01 UTC (permalink / raw) To: Alexander Potapenko, Jens Axboe Cc: linux-block, Willem de Bruijn, Dmitriy Vyukov On 11/22/2019 03:58 AM, Alexander Potapenko wrote: > I'm using the following patch in KMSAN tree to suppress these bugs: > > diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c > index 0e7da5015ccd5..9e8e99bdc0db3 100644 > --- a/drivers/block/null_blk_main.c > +++ b/drivers/block/null_blk_main.c > @@ -1235,8 +1235,16 @@ static blk_status_t null_handle_cmd(struct > nullb_cmd *cmd, sector_t sector, > if (dev->memory_backed) > cmd->error = null_handle_memory_backed(cmd, op); > > - if (!cmd->error && dev->zoned) > + if (!cmd->error && dev->zoned) { > cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); > + } else if (dev->queue_mode != NULL_Q_BIO) { > + /* > + * TODO(glider): this is a hack to suppress bugs in nullb > + * driver. I have no idea what I'm doing, better wait for a > + * proper fix from Jens Axboe. > + */ > + cmd->error = null_handle_rq(cmd); > + } This is just based on what I read in the code, I don't have the kmsan support. null_handle_rq() should not be called without checking the dev->memory_backed value since it handles memory backed operation, (it may be working because of the correct error handling done in the copy_from_nullb() when t_page == NULL). Possible explanation could be for the above code fixing the problem is:- null_handle_cmd() None of the ifs are handled (since it is default behavior) calls null_handle_rq() Processes each bvec in this request :- rq_for_each_segment() calls null_transfer() calls copy_from_null() Now here in copy_from_null(), null_lookup_page() returns NULL since it is not present in the radix tree(dev and cache since its !membacked && !cache_size I assume since I don't have the command line of modprobe null_blk), and then does the dest memset():- (from linux-block for-next) 1009 if (!t_page) { 1010 memset(dst + off + count, 0, temp); 1011 goto next; 1012 } which initializes the destination page to 0. This is what Jens is referring to I guess that we need to memset() it for the read when !membacked. It will be great if you can verify the above code path on your setup with above memset(). The above patch does fix the problem but it is very confusing. What we need is to traverse the rq bvec and actually initialize the bvec.page and repeat the pattern present in the copy_from_null() in such a way this common code is shared between membacked and non-membacked mode for REQ_OP_READ. Jens can correct me if I'm wrong. I'll be happy to look into this if Jens is okay. > > It appears to fix the problem, but I'm not really sure this is the > right thing to do (e.g. whether it covers all invocations of > null_handle_cmd(), probably not) > I don't see an easy way to memset() the pages being read, as there're > no pages at this point. > >> >-- ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2019-11-25 4:01 ` Chaitanya Kulkarni @ 2020-05-10 10:03 ` Alexander Potapenko 2020-05-10 16:20 ` Bart Van Assche 0 siblings, 1 reply; 14+ messages in thread From: Alexander Potapenko @ 2020-05-10 10:03 UTC (permalink / raw) To: Chaitanya Kulkarni, Damien Le Moal Cc: Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov (+Damien Le Moal) (more context at https://lore.kernel.org/linux-block/CAG_fn=VBHmBgqLi35tD27NRLH2tEZLH=Y+rTfZ3rKNz9ipG+jQ@mail.gmail.com/) On Mon, Nov 25, 2019 at 5:01 AM Chaitanya Kulkarni <Chaitanya.Kulkarni@wdc.com> wrote: > > On 11/22/2019 03:58 AM, Alexander Potapenko wrote: > > I'm using the following patch in KMSAN tree to suppress these bugs: > > > > diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c > > index 0e7da5015ccd5..9e8e99bdc0db3 100644 > > --- a/drivers/block/null_blk_main.c > > +++ b/drivers/block/null_blk_main.c > > @@ -1235,8 +1235,16 @@ static blk_status_t null_handle_cmd(struct > > nullb_cmd *cmd, sector_t sector, > > if (dev->memory_backed) > > cmd->error = null_handle_memory_backed(cmd, op); > > > > - if (!cmd->error && dev->zoned) > > + if (!cmd->error && dev->zoned) { > > cmd->error = null_handle_zoned(cmd, op, sector, nr_sectors); > > + } else if (dev->queue_mode != NULL_Q_BIO) { > > + /* > > + * TODO(glider): this is a hack to suppress bugs in nullb > > + * driver. I have no idea what I'm doing, better wait for a > > + * proper fix from Jens Axboe. > > + */ > > + cmd->error = null_handle_rq(cmd); > > + } > > This is just based on what I read in the code, I don't have > the kmsan support. > > null_handle_rq() should not be called without checking the > dev->memory_backed value since it handles memory backed operation, > (it may be working because of the correct error handling done > in the copy_from_nullb() when t_page == NULL). Thanks for the explanation! The code has changed recently, and my patch does not apply anymore, yet the problem still persists. I ended up just calling null_handle_rq() at the end of null_process_cmd(), but we probably need a cleaner fix. > Possible explanation could be for the above code fixing the problem > is:- > > null_handle_cmd() > None of the ifs are handled (since it is default behavior) > calls null_handle_rq() > Processes each bvec in this request :- > rq_for_each_segment() > calls null_transfer() > calls copy_from_null() > > Now here in copy_from_null(), null_lookup_page() returns > NULL since it is not present in the radix tree(dev and cache > since its !membacked && !cache_size I assume since I don't have > the command line of modprobe null_blk), > > and then does the dest memset():- > > (from linux-block for-next) > 1009 if (!t_page) { > 1010 memset(dst + off + count, 0, temp); > 1011 goto next; > 1012 } > > which initializes the destination page to 0. This is what Jens > is referring to I guess that we need to memset() it for the > read when !membacked. > > It will be great if you can verify the above code path on your > setup with above memset(). > > The above patch does fix the problem but it is very confusing. > > What we need is to traverse the rq bvec and actually initialize > the bvec.page and repeat the pattern present in the copy_from_null() > in such a way this common code is shared between membacked and > non-membacked mode for REQ_OP_READ. > > Jens can correct me if I'm wrong. > > I'll be happy to look into this if Jens is okay. > > > > > It appears to fix the problem, but I'm not really sure this is the > > right thing to do (e.g. whether it covers all invocations of > > null_handle_cmd(), probably not) > > I don't see an easy way to memset() the pages being read, as there're > > no pages at this point. > > > > > >> >-- > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-10 10:03 ` Alexander Potapenko @ 2020-05-10 16:20 ` Bart Van Assche 2020-05-11 12:58 ` Alexander Potapenko 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2020-05-10 16:20 UTC (permalink / raw) To: Alexander Potapenko, Chaitanya Kulkarni, Damien Le Moal Cc: Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020-05-10 03:03, Alexander Potapenko wrote: > Thanks for the explanation! > The code has changed recently, and my patch does not apply anymore, > yet the problem still persists. > I ended up just calling null_handle_rq() at the end of > null_process_cmd(), but we probably need a cleaner fix. Does this (totally untested) patch help? copy_to_nullb() guarantees that it will write some data to the pages that it allocates but does not guarantee yet that all data of the pages it allocates is initialized. diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 8efd8778e209..06f5761fccb6 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -848,7 +848,7 @@ static struct nullb_page *null_insert_page(struct nullb *nullb, spin_unlock_irq(&nullb->lock); - t_page = null_alloc_page(GFP_NOIO); + t_page = null_alloc_page(GFP_NOIO | __GFP_ZERO); if (!t_page) goto out_lock; ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-10 16:20 ` Bart Van Assche @ 2020-05-11 12:58 ` Alexander Potapenko 2020-05-11 13:01 ` Damien Le Moal 2020-05-11 23:18 ` Bart Van Assche 0 siblings, 2 replies; 14+ messages in thread From: Alexander Potapenko @ 2020-05-11 12:58 UTC (permalink / raw) To: Bart Van Assche Cc: Chaitanya Kulkarni, Damien Le Moal, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On Sun, May 10, 2020 at 6:20 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 2020-05-10 03:03, Alexander Potapenko wrote: > > Thanks for the explanation! > > The code has changed recently, and my patch does not apply anymore, > > yet the problem still persists. > > I ended up just calling null_handle_rq() at the end of > > null_process_cmd(), but we probably need a cleaner fix. > > Does this (totally untested) patch help? copy_to_nullb() guarantees that > it will write some data to the pages that it allocates but does not > guarantee yet that all data of the pages it allocates is initialized. No, this does not help. Apparently null_insert_page() is never called in this scenario. If I modify __page_cache_alloc() to allocate zero-initialized pages, the reports go away. This means there's no other uninitialized buffer that's copied to the page cache, the nullb driver just forgets to write anything to the page cache. > diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c > index 8efd8778e209..06f5761fccb6 100644 > --- a/drivers/block/null_blk_main.c > +++ b/drivers/block/null_blk_main.c > @@ -848,7 +848,7 @@ static struct nullb_page *null_insert_page(struct > nullb *nullb, > > spin_unlock_irq(&nullb->lock); > > - t_page = null_alloc_page(GFP_NOIO); > + t_page = null_alloc_page(GFP_NOIO | __GFP_ZERO); > if (!t_page) > goto out_lock; > -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-11 12:58 ` Alexander Potapenko @ 2020-05-11 13:01 ` Damien Le Moal 2020-05-11 13:09 ` Alexander Potapenko 2020-05-11 23:18 ` Bart Van Assche 1 sibling, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2020-05-11 13:01 UTC (permalink / raw) To: Alexander Potapenko, Bart Van Assche Cc: Chaitanya Kulkarni, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020/05/11 21:58, Alexander Potapenko wrote: > On Sun, May 10, 2020 at 6:20 PM Bart Van Assche <bvanassche@acm.org> wrote: >> >> On 2020-05-10 03:03, Alexander Potapenko wrote: >>> Thanks for the explanation! >>> The code has changed recently, and my patch does not apply anymore, >>> yet the problem still persists. >>> I ended up just calling null_handle_rq() at the end of >>> null_process_cmd(), but we probably need a cleaner fix. >> >> Does this (totally untested) patch help? copy_to_nullb() guarantees that >> it will write some data to the pages that it allocates but does not >> guarantee yet that all data of the pages it allocates is initialized. > > No, this does not help. Apparently null_insert_page() is never called > in this scenario. > If I modify __page_cache_alloc() to allocate zero-initialized pages, > the reports go away. > This means there's no other uninitialized buffer that's copied to the > page cache, the nullb driver just forgets to write anything to the > page cache. Can you describe again the problem you are seeing please ? I can't find the first email of this thread and forgot what the problem is. > >> diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c >> index 8efd8778e209..06f5761fccb6 100644 >> --- a/drivers/block/null_blk_main.c >> +++ b/drivers/block/null_blk_main.c >> @@ -848,7 +848,7 @@ static struct nullb_page *null_insert_page(struct >> nullb *nullb, >> >> spin_unlock_irq(&nullb->lock); >> >> - t_page = null_alloc_page(GFP_NOIO); >> + t_page = null_alloc_page(GFP_NOIO | __GFP_ZERO); >> if (!t_page) >> goto out_lock; >> > > -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-11 13:01 ` Damien Le Moal @ 2020-05-11 13:09 ` Alexander Potapenko 0 siblings, 0 replies; 14+ messages in thread From: Alexander Potapenko @ 2020-05-11 13:09 UTC (permalink / raw) To: Damien Le Moal Cc: Bart Van Assche, Chaitanya Kulkarni, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov > > Can you describe again the problem you are seeing please ? I can't find the > first email of this thread and forgot what the problem is. > Yes, sorry. The original message was: >> I'm debugging an issue in nullb driver reported by KMSAN at QEMU startup. >> There are numerous reports like the one below when checking nullb for >> different partition types. >> Basically, read_dev_sector() allocates a cache page which is then >> wrapped into a bio and passed to the device driver, but never >> initialized. >> I've tracked the problem down to a call to null_handle_cmd(cmd, >> /*sector*/0, /*nr_sectors*/8, /*op*/0). >> Turns out all the if-branches in this function are skipped, so neither >> of null_handle_throttled(), null_handle_flush(), >> null_handle_badblocks(), null_handle_memory_backed(), >> null_handle_zoned() is executed, and we proceed directly to >> nullb_complete_cmd(). >> As a result, the pages read from the nullb device are never >> initialized, at least at boot time. >> How can we fix this? Today null_handle_cmd() looks different, but the problem is still manifesting. KMSAN reports look as follows: ===================================================== BUG: KMSAN: uninit-value in adfspart_check_ICS+0xb8e/0xde0 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #4177 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 adfspart_check_ICS+0xb8e/0xde0 block/partitions/acorn.c:364 check_partition block/partitions/core.c:140 blk_add_partitions+0x86a/0x2560 block/partitions/core.c:571 bdev_disk_changed+0x5c2/0xa30 fs/block_dev.c:1543 __blkdev_get+0x1195/0x2280 fs/block_dev.c:1646 blkdev_get+0x219/0x6b0 fs/block_dev.c:1748 register_disk block/genhd.c:763 __device_add_disk+0x15b5/0x20a0 block/genhd.c:853 device_add_disk+0x90/0xa0 block/genhd.c:871 add_disk ./include/linux/genhd.h:294 null_gendisk_register drivers/block/null_blk_main.c:1628 null_add_dev+0x2eaa/0x35d0 drivers/block/null_blk_main.c:1803 null_init+0x6c5/0xd8d drivers/block/null_blk_main.c:1888 do_one_initcall+0x4c9/0x930 init/main.c:1160 ... Uninit was created at: kmsan_save_stack_with_flags+0x3c/0x90 mm/kmsan/kmsan.c:144 kmsan_internal_alloc_meta_for_pages mm/kmsan/kmsan_shadow.c:280 kmsan_alloc_page+0xb9/0x180 mm/kmsan/kmsan_shadow.c:304 __alloc_pages_nodemask+0xc0e/0x5dd0 mm/page_alloc.c:4848 __alloc_pages ./include/linux/gfp.h:504 alloc_page_interleave mm/mempolicy.c:2161 alloc_pages_current+0x2e7/0x990 mm/mempolicy.c:2293 alloc_pages ./include/linux/gfp.h:540 __page_cache_alloc+0x95/0x310 mm/filemap.c:959 do_read_cache_page+0x293/0x1510 mm/filemap.c:2752 read_cache_page+0xf3/0x110 mm/filemap.c:2867 read_mapping_page ./include/linux/pagemap.h:397 read_part_sector+0x156/0x560 block/partitions/core.c:643 adfspart_check_ICS+0xa0/0xde0 block/partitions/acorn.c:360 check_partition block/partitions/core.c:140 blk_add_partitions+0x86a/0x2560 block/partitions/core.c:571 bdev_disk_changed+0x5c2/0xa30 fs/block_dev.c:1543 __blkdev_get+0x1195/0x2280 fs/block_dev.c:1646 blkdev_get+0x219/0x6b0 fs/block_dev.c:1748 register_disk block/genhd.c:763 __device_add_disk+0x15b5/0x20a0 block/genhd.c:853 device_add_disk+0x90/0xa0 block/genhd.c:871 add_disk ./include/linux/genhd.h:294 null_gendisk_register drivers/block/null_blk_main.c:1628 null_add_dev+0x2eaa/0x35d0 drivers/block/null_blk_main.c:1803 null_init+0x6c5/0xd8d drivers/block/null_blk_main.c:1888 do_one_initcall+0x4c9/0x930 init/main.c:1160 ===================================================== ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-11 12:58 ` Alexander Potapenko 2020-05-11 13:01 ` Damien Le Moal @ 2020-05-11 23:18 ` Bart Van Assche 2020-05-12 1:25 ` Bart Van Assche 1 sibling, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2020-05-11 23:18 UTC (permalink / raw) To: Alexander Potapenko Cc: Chaitanya Kulkarni, Damien Le Moal, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020-05-11 05:58, Alexander Potapenko wrote: > On Sun, May 10, 2020 at 6:20 PM Bart Van Assche <bvanassche@acm.org> wrote: >> >> On 2020-05-10 03:03, Alexander Potapenko wrote: >>> Thanks for the explanation! >>> The code has changed recently, and my patch does not apply anymore, >>> yet the problem still persists. >>> I ended up just calling null_handle_rq() at the end of >>> null_process_cmd(), but we probably need a cleaner fix. >> >> Does this (totally untested) patch help? copy_to_nullb() guarantees that >> it will write some data to the pages that it allocates but does not >> guarantee yet that all data of the pages it allocates is initialized. > > No, this does not help. Apparently null_insert_page() is never called > in this scenario. > If I modify __page_cache_alloc() to allocate zero-initialized pages, > the reports go away. > This means there's no other uninitialized buffer that's copied to the > page cache, the nullb driver just forgets to write anything to the > page cache. Hi Alexander, I had misread the email at the start of this thread. My patch only affects the "memory backed" mode while the email at the start of this thread explains that the KMSAN report refers to the memory_backed == 0 mode. Anyway, can you give the patch below a try? Thanks, Bart. diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 06f5761fccb6..682b38ccef57 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1250,8 +1250,36 @@ static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, return errno_to_blk_status(err); } +static void nullb_zero_data_buffer(const struct request *rq) +{ + struct req_iterator iter; + struct bio_vec bvec; + struct page *page; + void *kaddr; + u32 offset, left, len; + + rq_for_each_bvec(bvec, rq, iter) { + page = bvec.bv_page; + offset = bvec.bv_offset; + left = bvec.bv_len; + while (left) { + kaddr = kmap_atomic(page); + len = min_t(u32, left, PAGE_SIZE - offset); + memset(kaddr + offset, 0, len); + kunmap_atomic(kaddr); + page++; + left -= len; + offset = 0; + } + } +} + +/* Complete a request. Only called if dev->memory_backed == 0. */ static inline void nullb_complete_cmd(struct nullb_cmd *cmd) { + if (req_op(cmd->rq) == REQ_OP_READ) + nullb_zero_data_buffer(cmd->rq); + /* Complete IO by inline, softirq or timer */ switch (cmd->nq->dev->irqmode) { case NULL_IRQ_SOFTIRQ: ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-11 23:18 ` Bart Van Assche @ 2020-05-12 1:25 ` Bart Van Assche 2020-05-12 1:42 ` Damien Le Moal 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2020-05-12 1:25 UTC (permalink / raw) To: Alexander Potapenko Cc: Chaitanya Kulkarni, Damien Le Moal, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov [-- Attachment #1: Type: text/plain, Size: 211 bytes --] On 2020-05-11 16:18, Bart Van Assche wrote: > Anyway, can you give the patch below a try? The patch in my previous email handles MQ mode but not bio mode. The attached patch should handle both. Thanks, Bart. [-- Attachment #2: 0001-null_blk-Zero-initialize-read-buffers.patch --] [-- Type: text/x-patch, Size: 1880 bytes --] From f3fcdf9000226dc1354557eae53d4541d4059d03 Mon Sep 17 00:00:00 2001 From: Bart Van Assche <bvanassche@acm.org> Date: Mon, 11 May 2020 15:21:02 -0700 Subject: [PATCH] null_blk: Zero-initialize read buffers --- drivers/block/null_blk_main.c | 45 +++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/drivers/block/null_blk_main.c b/drivers/block/null_blk_main.c index 06f5761fccb6..39dd82683e10 100644 --- a/drivers/block/null_blk_main.c +++ b/drivers/block/null_blk_main.c @@ -1250,8 +1250,53 @@ static inline blk_status_t null_handle_memory_backed(struct nullb_cmd *cmd, return errno_to_blk_status(err); } +static void nullb_zero_bvec(const struct bio_vec *bvec) +{ + struct page *page = bvec->bv_page; + u32 offset = bvec->bv_offset; + u32 left = bvec->bv_len; + + while (left) { + u32 len = min_t(u32, left, PAGE_SIZE - offset); + void *kaddr; + + kaddr = kmap_atomic(page); + memset(kaddr + offset, 0, len); + kunmap_atomic(kaddr); + page++; + left -= len; + offset = 0; + } +} + +static void nullb_zero_bio_data_buffer(struct bio *bio) +{ + struct bvec_iter iter; + struct bio_vec bvec; + + bio_for_each_bvec(bvec, bio, iter) + nullb_zero_bvec(&bvec); +} + +static void nullb_zero_rq_data_buffer(const struct request *rq) +{ + struct req_iterator iter; + struct bio_vec bvec; + + rq_for_each_bvec(bvec, rq, iter) + nullb_zero_bvec(&bvec); +} + +/* Complete a request. Only called if dev->memory_backed == 0. */ static inline void nullb_complete_cmd(struct nullb_cmd *cmd) { + struct nullb_device *dev = cmd->nq->dev; + + if (dev->queue_mode == NULL_Q_BIO && bio_op(cmd->bio) == REQ_OP_READ) + nullb_zero_bio_data_buffer(cmd->bio); + else if (req_op(cmd->rq) == REQ_OP_READ) + nullb_zero_rq_data_buffer(cmd->rq); + /* Complete IO by inline, softirq or timer */ switch (cmd->nq->dev->irqmode) { case NULL_IRQ_SOFTIRQ: ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-12 1:25 ` Bart Van Assche @ 2020-05-12 1:42 ` Damien Le Moal 2020-05-12 2:43 ` Bart Van Assche 0 siblings, 1 reply; 14+ messages in thread From: Damien Le Moal @ 2020-05-12 1:42 UTC (permalink / raw) To: Bart Van Assche, Alexander Potapenko Cc: Chaitanya Kulkarni, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020/05/12 10:25, Bart Van Assche wrote: > On 2020-05-11 16:18, Bart Van Assche wrote: >> Anyway, can you give the patch below a try? > > The patch in my previous email handles MQ mode but not bio mode. The > attached patch should handle both. > > Thanks, > > Bart. > Bart, The patch looks good to me. However, I have one concern regarding the performance impact of this. When nullblk is used to benchmark the block IO stack overhead, doing this zeroing unconditionally will likely significantly impact measured performance. So may be this zeroing feature should be driven by a modprobe/configfs option ? Doing so, we can keep it off by default, preserving performance, and turn it on when needed as in Alexander use case. Thoughts ? -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-12 1:42 ` Damien Le Moal @ 2020-05-12 2:43 ` Bart Van Assche 2020-05-12 3:23 ` Damien Le Moal 0 siblings, 1 reply; 14+ messages in thread From: Bart Van Assche @ 2020-05-12 2:43 UTC (permalink / raw) To: Damien Le Moal, Alexander Potapenko Cc: Chaitanya Kulkarni, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020-05-11 18:42, Damien Le Moal wrote: > The patch looks good to me. However, I have one concern regarding the > performance impact of this. When nullblk is used to benchmark the block IO stack > overhead, doing this zeroing unconditionally will likely significantly impact > measured performance. So may be this zeroing feature should be driven by a > modprobe/configfs option ? Doing so, we can keep it off by default, preserving > performance, and turn it on when needed as in Alexander use case. > > Thoughts ? Hi Damien, Does the current implementation of null_blk allow one process to access data that was generated by another process? If so, does that behavior count as a security bug? I am aware of the performance impact of the patch attached to my previous email. I have not made the zeroing behavior optional because I'm concerned about the security implications of doing that. Bart. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: null_handle_cmd() doesn't initialize data when reading 2020-05-12 2:43 ` Bart Van Assche @ 2020-05-12 3:23 ` Damien Le Moal 0 siblings, 0 replies; 14+ messages in thread From: Damien Le Moal @ 2020-05-12 3:23 UTC (permalink / raw) To: Bart Van Assche, Alexander Potapenko Cc: Chaitanya Kulkarni, Jens Axboe, linux-block, Willem de Bruijn, Dmitriy Vyukov On 2020/05/12 11:43, Bart Van Assche wrote: > On 2020-05-11 18:42, Damien Le Moal wrote: >> The patch looks good to me. However, I have one concern regarding the >> performance impact of this. When nullblk is used to benchmark the block IO stack >> overhead, doing this zeroing unconditionally will likely significantly impact >> measured performance. So may be this zeroing feature should be driven by a >> modprobe/configfs option ? Doing so, we can keep it off by default, preserving >> performance, and turn it on when needed as in Alexander use case. >> >> Thoughts ? > > Hi Damien, > > Does the current implementation of null_blk allow one process to access > data that was generated by another process? If so, does that behavior > count as a security bug? null_blk not changing in any way the buffer pages for reads may have implications in this area. I am not sure, I would need to go back read through the page cache read path to see. There is page zeroing going on at that level (e.g. reading a file hole, reading after eof) but not sure if that data leak protection applies to nullblk or raw block device file accesses in general. Likely not. Raw block device file accesses are normally reserved to root user only for a reason... > I am aware of the performance impact of the patch attached to my > previous email. I have not made the zeroing behavior optional because > I'm concerned about the security implications of doing that. Understood. But since null_blk is essentially a test tool, I wonder if security should be a concern. Personally, I definitely would privilege performance aspects over security for null_blk, but I am not running it in a sensitive environment either... I think it may be good to involve Jens and ask him about his thoughts on the subject. -- Damien Le Moal Western Digital Research ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-05-12 3:23 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-15 10:16 null_handle_cmd() doesn't initialize data when reading Alexander Potapenko 2019-11-20 23:12 ` Jens Axboe 2019-11-22 11:58 ` Alexander Potapenko 2019-11-25 4:01 ` Chaitanya Kulkarni 2020-05-10 10:03 ` Alexander Potapenko 2020-05-10 16:20 ` Bart Van Assche 2020-05-11 12:58 ` Alexander Potapenko 2020-05-11 13:01 ` Damien Le Moal 2020-05-11 13:09 ` Alexander Potapenko 2020-05-11 23:18 ` Bart Van Assche 2020-05-12 1:25 ` Bart Van Assche 2020-05-12 1:42 ` Damien Le Moal 2020-05-12 2:43 ` Bart Van Assche 2020-05-12 3:23 ` Damien Le Moal
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).