qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Tony Asleson <tasleson@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
	Qemu-block <qemu-block@nongnu.org>
Subject: Re: [RFC 4/4] ahci media error reporting
Date: Thu, 19 Sep 2019 16:43:57 -0400	[thread overview]
Message-ID: <df07a621-8515-2414-2f59-a7eb7eebd75b@redhat.com> (raw)
In-Reply-To: <20190919194847.18518-5-tasleson@redhat.com>



On 9/19/19 3:48 PM, Tony Asleson wrote:
> Initial attempt at returning a media error for ahci.  This is certainly
> wrong and needs serious improvement.
> 

Hi; I have the unfortunate distinction of being the AHCI maintainer.
Please CC me on future revisions and discussion if you are interacting
with my problem child.

Also remember to CC qemu-block.

> Signed-off-by: Tony Asleson <tasleson@redhat.com>
> ---
>  hw/ide/ahci.c | 27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
> index d45393c019..f487764106 100644
> --- a/hw/ide/ahci.c
> +++ b/hw/ide/ahci.c
> @@ -36,6 +36,7 @@
>  #include "hw/ide/internal.h"
>  #include "hw/ide/pci.h"
>  #include "ahci_internal.h"
> +#include "block/error_inject.h"
>  
>  #include "trace.h"
>  
> @@ -999,6 +1000,22 @@ static void ncq_err(NCQTransferState *ncq_tfs)
>      ncq_tfs->used = 0;
>  }
>  
> +/*
> + * Figure out correct way to report media error, this is at best a guess
> + * and based on the output of linux kernel, not even remotely close.
> + */

Depends on what kind of error, exactly, you're trying to report, and at
what level. (AHCI, NCQ, SATA, ATA?)

Keep in mind that you've inserted an error path for SATA drives using
NCQ, but seemingly haven't touched SATA drives not using NCQ, or ATAPI
devices using either PATA/SATA, or ATA drives on PATA.

> +static void ncq_media_err(NCQTransferState *ncq_tfs, uint64_t err_sector)
> +{
> +    IDEState *ide_state = &ncq_tfs->drive->port.ifs[0];
> +
> +    ide_state->error = ECC_ERR;
> +    ide_state->status = READY_STAT | ERR_STAT;
> +    ncq_tfs->drive->port_regs.scr_err |= (1 << ncq_tfs->tag);
> +    ncq_tfs->lba = err_sector;
> +    qemu_sglist_destroy(&ncq_tfs->sglist);
> +    ncq_tfs->used = 0;
> +}
> +

If you are definitely very sure you only want an ide_state error
difference, you could just as well call ncq_err and then patch
ide_state->error.

(I am not sure that's what you want, but... maybe it is?)

I'd have to check -- because I can't say the AHCI emulator was designed
so much as congealed -- but you might need calls to ncq_finish.

usually, ncq_cb handles the return from any NCQ command and will call
ncq_err and ncq_finish as appropriate to tidy up the command.

It might be a mistake that execute_ncq_command issues ncq_err in the
`default` arm of the switch statement without a call to finish.

If we do call ncq_finish from this context I'm not sure if we want
block_acct_done here unconditionally. We may not have started a block
accounting operation if we never started a backend operation. Everything
else looks about right to me.


>  static void ncq_finish(NCQTransferState *ncq_tfs)
>  {
>      /* If we didn't error out, set our finished bit. Errored commands
> @@ -1065,6 +1082,8 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
>  {
>      AHCIDevice *ad = ncq_tfs->drive;
>      IDEState *ide_state = &ad->port.ifs[0];
> +    uint64_t error_sector = 0;
> +    char device_id[32];
>      int port = ad->port_no;
>  
>      g_assert(is_ncq(ncq_tfs->cmd));
> @@ -1072,6 +1091,14 @@ static void execute_ncq_command(NCQTransferState *ncq_tfs)
>  
>      switch (ncq_tfs->cmd) {
>      case READ_FPDMA_QUEUED:
> +        sprintf(device_id, "%lu", ide_state->wwn);

This seems suspicious for your design in general, but I'd like to not
run sprintf to a buffer in the hotpath for NCQ.

If you need this, I'd do it when wwn is set and just grab it from the
state structure.

> +
> +        if (error_in_read(device_id, ncq_tfs->lba,
> +                ncq_tfs->sector_count, &error_sector)) {
> +            ncq_media_err(ncq_tfs, error_sector);
> +            return;
> +        }
> +

One of the downsides to trying to trigger read error injections
per-device instead of per-node is that now you have to goof around with
device-specific code everywhere.

I suppose from your cover letter you *WANT* device-specific error
exercising, which would necessitate a different design from blkdebug,
but it means you have to add support for the framework per-device and it
might be very tricky to get right.

>          trace_execute_ncq_command_read(ad->hba, port, ncq_tfs->tag,
>                                         ncq_tfs->sector_count, ncq_tfs->lba);
>          dma_acct_start(ide_state->blk, &ncq_tfs->acct,
> 


  reply	other threads:[~2019-09-19 20:57 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-19 19:48 [RFC 0/4] POC: Generating realistic block errors Tony Asleson
2019-09-19 19:48 ` [RFC 1/4] Add qapi for block error injection Tony Asleson
2019-09-20  9:03   ` Philippe Mathieu-Daudé
2019-09-20 15:17     ` Tony Asleson
2019-09-19 19:48 ` [RFC 2/4] SCSI media error reporting Tony Asleson
2019-09-19 19:48 ` [RFC 3/4] NVMe " Tony Asleson
2019-09-19 19:48 ` [RFC 4/4] ahci " Tony Asleson
2019-09-19 20:43   ` John Snow [this message]
2019-09-19 21:49     ` Tony Asleson
2019-09-20 17:22       ` John Snow
2019-09-20  8:43     ` Kevin Wolf
2019-09-20 16:18       ` John Snow
2019-09-20 19:25         ` Tony Asleson
2019-09-20 19:29           ` John Snow
2019-09-20  8:36 ` [RFC 0/4] POC: Generating realistic block errors Kevin Wolf
2019-09-20 16:41   ` Tony Asleson
2019-09-20 17:08     ` Eric Blake
2019-09-20 19:15       ` Tony Asleson
2019-09-20 18:11     ` Kevin Wolf
2019-09-20 18:55       ` Tony Asleson
2019-09-30 14:54         ` Kevin Wolf
2019-09-20  9:22 ` Stefan Hajnoczi
2019-09-20 17:28   ` Tony Asleson
2019-11-14 15:47     ` Tony Asleson
2019-11-21 10:30       ` Stefan Hajnoczi
2019-11-21 11:12         ` Kevin Wolf
2019-11-26 18:19         ` Tony Asleson
2019-11-26 19:28           ` Kevin Wolf
2019-09-20  9:25 ` Stefan Hajnoczi
2019-09-20 14:41 ` no-reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=df07a621-8515-2414-2f59-a7eb7eebd75b@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=tasleson@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).