From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CAA5CC433EF for ; Thu, 2 Jun 2022 01:07:57 +0000 (UTC) Received: from localhost ([::1]:34054 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nwZJc-0006p2-LM for qemu-devel@archiver.kernel.org; Wed, 01 Jun 2022 21:07:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:39166) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nwZGb-0005Fd-H6 for qemu-devel@nongnu.org; Wed, 01 Jun 2022 21:04:49 -0400 Received: from mail-yb1-xb2c.google.com ([2607:f8b0:4864:20::b2c]:46002) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nwZGa-0004nk-0w for qemu-devel@nongnu.org; Wed, 01 Jun 2022 21:04:49 -0400 Received: by mail-yb1-xb2c.google.com with SMTP id g4so5771578ybf.12 for ; Wed, 01 Jun 2022 18:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=aRHZ2b7GchnNMEpcnNW9PNRCve29IKGCLlpfdviYqsY=; b=Y0DpKor/O8hQSkkjy6GijMtsCn+y2OK33ltAWeXU2oqXZiEA4sFcVEUqoi5wYkPPo2 EM4klCTZUBlz1qWqGNwQzLsajYhelRq8ve8rejQXuWrIMlH51WD5k7FOMn4WCJwx26Yl qedisuGWWtBEUFS3HMdbwPUEo0QFIH+UTjBPKKpOHsuL69NPRUcsvgjC/OBfjb8OW67y 1l28R7/mhaUiDRYrC8aTfTsxmZB1aIh+23XYRPhVm84i9YTraZFhhFY1P5sDHteZb0Y2 mo//K7JC4JoFWoKIjSRE0KWIKuHOo1jJCBAmSi2ZgUzy5k3z15DMDwOvNpMsCXvDqXxf w2Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=aRHZ2b7GchnNMEpcnNW9PNRCve29IKGCLlpfdviYqsY=; b=zvkXLI0hjcs1xJgHHcqlll7rrlBga1m0sxYVTHUu3S3jr/iXshVc7fIdRfj6Kaw7EQ x17mP+mWrmKUg9DpfrH8TPpdMHSozXRIrQhzli3VIfsl6pnGQ4xne+gMNxuFqlqaeFnm VRJiAs2BjNS+7kN3iWe8Q1j5FXx8q6KcwrItV/OOlbdIWm7EfJsN0VR6F0bCSuYA7Qzf RpSUVirf6eQAvid0mEe33iZHDdregjl9LAUsJUs3xj+aY4JxMUyNp/JuaM9dFiHGltns eiV3KajiBs9iMi260BhFzAs5KSlDbmaN2XXZpXJYmAq2vdGT7ORGI3rfJR1NFX1yuexj +UIA== X-Gm-Message-State: AOAM531r1OV8983kaFg8WIUms3ZJCS7IsuW9CAaApAmrcT4dhs2RqfxE KvkZVKWW90ZpQ/8JSJmPZBr5z/SGWEqYyUVDFq4= X-Google-Smtp-Source: ABdhPJxTgwwnDprBVHaD6Vo7GJ37rVhAPXc+YvZhlBW8Aa8wHCHlIH51pdxbsELvBHA9Z3mQnM9JUrrWhPt9u+E6Xqo= X-Received: by 2002:a25:868a:0:b0:65c:c463:24fc with SMTP id z10-20020a25868a000000b0065cc46324fcmr2837332ybk.393.1654131886612; Wed, 01 Jun 2022 18:04:46 -0700 (PDT) MIME-Version: 1.0 References: <20220427205056.2522-1-t.zhang2@samsung.com> <0b54d6c7-f56d-1ad2-80b7-d75d1033d67e@redhat.com> In-Reply-To: From: Tong Zhang Date: Wed, 1 Jun 2022 18:04:35 -0700 Message-ID: Subject: Re: [RESEND PATCH] hw/dma: fix crash caused by race condition To: Stefan Hajnoczi Cc: David Hildenbrand , Tong Zhang , Paolo Bonzini , Peter Xu , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , "qemu-devel@nongnu.org" , Francisco Londono Content-Type: text/plain; charset="UTF-8" Received-SPF: pass client-ip=2607:f8b0:4864:20::b2c; envelope-from=ztong0001@gmail.com; helo=mail-yb1-xb2c.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Hi Stefan, On Wed, Jun 1, 2022 at 6:56 AM Stefan Hajnoczi wrote: > > > > This patch makes sense to me. Can you rephrase your concern? > > > > The locking is around dbs->io_func(). > > > > aio_context_acquire(dbs->ctx); > > dbs->acb = dbs->io_func() > > aio_context_release(dbs->ctx); > > > > > > So where exactly would the lock that's now still held stop someone from > > modifying dbs->acb = NULL at the beginning of the function, which seems > > to be not protected by that lock? > > > > Maybe I'm missing some locking magic due to the lock being a recursive lock. > > Tong Zhang: Can you share a backtrace of all threads when the > assertion failure occurs? > Sorry I couldn't get the trace now -- but I can tell that we have some internal code uses this dma related code and will grab dbs->ctx lock in another thread and could overwrite dbs->acb. >From my understanding, one of the reasons that the lock is required here is to protect dbs->acb, we could not reliably test io_func()'s return value after releasing the lock here. Since this code affects our internal code base and I did not reproduce on master branch, feel free to ignore it. - Tong > Stefan