qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Tony Asleson <tasleson@redhat.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org
Subject: Re: [RFC 0/4] POC: Generating realistic block errors
Date: Fri, 20 Sep 2019 10:22:26 +0100	[thread overview]
Message-ID: <20190920092226.GH14365@stefanha-x1.localdomain> (raw)
In-Reply-To: <20190919194847.18518-1-tasleson@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 2898 bytes --]

On Thu, Sep 19, 2019 at 02:48:43PM -0500, Tony Asleson wrote:
> For a long time I thought that VMs could be a great way to improve OS
> code quality by modifying the simulated hardware to return errors to
> exercise error paths in the OSs, specifically in block devices for right now.
> A number of different approaches are available within the Linux kernel, eg.
> scsi-debug, dm-flakey, and others.  However, I always though it would be best to
> simulate it from the hardware.  To fully exercise the entire stack.  As a
> bonus it's OS agnostic for those projects that cross OSs and it's available
> before the OS even boots.
> 
> This POC needs a lot more work, but it's what I have so far.  Learning QEMU
> internals, plus some of the different bus types has been interesting to say
> the least.  I'm most familiar with SCSI, but the others are foreign to me.
> AHCI/SATA/ATA is very interesting with it's history and the associated code to
> handle it's evolution.
> 
> Eventually I think it would be useful to add functionality for errors on
> write paths, timeouts, and different error behaviors.  Like media error(s) that
> are corrected by a re-write (simulate failed write on power loss), bit rot
> injection etc.  I know a number of these can be solved different ways,
> but embracing them from the VM environment seems ideal to me.  Expanding
> to gather statistics on IO patterns, histograms etc. could be very
> useful too.  Having the ability to start/stop information collection and
> then have access to what happened and in what order could allow for
> better error injection of key FS structures and software RAID solutions too.
> 
> I've recently been informed that blkdebug exists.  From a cursory investigation
> it does appear have overlap, but I haven't given it a try yet.  From looking
> at the code to insert my changes it appears that some of the errors I'm
> generating are different than what for example an EIO on a read_aio does, but
> I'm not sure.  Perhaps some of the other features that I've outlined above
> already exist too in some other capacity of QEMU?

blkdebug is purely at the QEMU block layer level.  It is not aware of
storage controller-specific error information or features.  If you want
to inject NVMe- or SCSI-specific errors that make no sense in QEMU's
block layer, then trying to do it in blkdebug becomes a layering
violation.  This justifies adding a new error injection feature directly
into AHCI, virtio-scsi, NVMe, etc devices.

What I like about blkdebug is the state machine and relatively powerful
tests that this enables.  It makes sense to reuse those for storage
controller-specific error injection too.  In the future we may wish to
reuse it for network cards and other emulated devices too!

Perhaps the error injection "engine" (the core blkdebug code) can be
extracted and reused?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2019-09-20  9:25 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 19:48 [RFC 0/4] POC: Generating realistic block errors Tony Asleson
2019-09-19 19:48 ` [RFC 1/4] Add qapi for block error injection Tony Asleson
2019-09-20  9:03   ` Philippe Mathieu-Daudé
2019-09-20 15:17     ` Tony Asleson
2019-09-19 19:48 ` [RFC 2/4] SCSI media error reporting Tony Asleson
2019-09-19 19:48 ` [RFC 3/4] NVMe " Tony Asleson
2019-09-19 19:48 ` [RFC 4/4] ahci " Tony Asleson
2019-09-19 20:43   ` John Snow
2019-09-19 21:49     ` Tony Asleson
2019-09-20 17:22       ` John Snow
2019-09-20  8:43     ` Kevin Wolf
2019-09-20 16:18       ` John Snow
2019-09-20 19:25         ` Tony Asleson
2019-09-20 19:29           ` John Snow
2019-09-20  8:36 ` [RFC 0/4] POC: Generating realistic block errors Kevin Wolf
2019-09-20 16:41   ` Tony Asleson
2019-09-20 17:08     ` Eric Blake
2019-09-20 19:15       ` Tony Asleson
2019-09-20 18:11     ` Kevin Wolf
2019-09-20 18:55       ` Tony Asleson
2019-09-30 14:54         ` Kevin Wolf
2019-09-20  9:22 ` Stefan Hajnoczi [this message]
2019-09-20 17:28   ` Tony Asleson
2019-11-14 15:47     ` Tony Asleson
2019-11-21 10:30       ` Stefan Hajnoczi
2019-11-21 11:12         ` Kevin Wolf
2019-11-26 18:19         ` Tony Asleson
2019-11-26 19:28           ` Kevin Wolf
2019-09-20  9:25 ` Stefan Hajnoczi
2019-09-20 14:41 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190920092226.GH14365@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tasleson@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).