From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 24ABAC433E0 for ; Thu, 18 Jun 2020 05:03:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F062221883 for ; Thu, 18 Jun 2020 05:03:23 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="sk0UhljZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726927AbgFRFDX (ORCPT ); Thu, 18 Jun 2020 01:03:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52664 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725892AbgFRFDW (ORCPT ); Thu, 18 Jun 2020 01:03:22 -0400 Received: from mail-oi1-x241.google.com (mail-oi1-x241.google.com [IPv6:2607:f8b0:4864:20::241]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AF8CC06174E for ; Wed, 17 Jun 2020 22:03:21 -0700 (PDT) Received: by mail-oi1-x241.google.com with SMTP id x202so3940444oix.11 for ; Wed, 17 Jun 2020 22:03:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BIDh4J+zTlSxePzOWp66aw0kp0VQetdJJAopqYlkrnc=; b=sk0UhljZUy/GbZrMH6kwfJ0yBWDcWUSyR+mqQ1qG6Quwt58GRqMcZVz+6kFk2OWfZj pjWGnq2VrYa2ZfacWe1aQxRf01+Z6Yht0d9VjYWmRz5ksGw2wApkkzt55A/d5I/macWN EMND5W88PLbUpjpnMxog7sRUN30t4UWPpBWicLUHQW3aexAmoFcsqzw/Xa4YMkjUOLaF r0kF9A+Ayhx552Wvb7SR0gnZzTQ4BaA/J4UdRQFdwmh2WtuxUz2JCeXhhYgU4IaQ6JFM CiLFhKzb4hBqAxalo8TkRcDWkw6vaQ9VR+PR0y7RI3rFEXHmzxY/nBLLABRAysau5ULO m3LA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BIDh4J+zTlSxePzOWp66aw0kp0VQetdJJAopqYlkrnc=; b=OcpH5Nyppw5Q4dCkhv2Tttr9vgcZA5B1fX6g2JmUiERXYvm/4seHDjFbLIyqmoV7z+ 0z6Ua0DoBoanRW2h1H76yMliknAJ6Rut78T2qoD7l/FGSZP5pRoIW2/Fs+LjhjnZX3vo g86eG063js48EdNRqGaY6UKnwphXBra6c3Wnxdmf2A+/A8MVLywV/yXhecfGzAgS6gx+ iH8aovYkU8HgJ2oICDOwj7+Rxx3KN/krBxBwmLkG7/UUUqfL8778FkixEjtXNGDnFnz7 G1q8AP0ysWl1fd/B7i3otYWutlOJLlXBXM21io2u4JpdEMuFaAMoo1yBTc0Qet1MrksW hDJg== X-Gm-Message-State: AOAM533OARbB48Mcbgp7kZdgpzPtr6dl91mprZGWj2mHtdUxE5uLGXOT dh/SR52G2niSz/VzH+Tx4x26NMaPCyXjXyA7iQ0= X-Google-Smtp-Source: ABdhPJzglgCXc40cKH+SxqWcyveX2n5yURw6wQgxF6GjEwIyodSn4Xa3jiLhZw8vLQXqiH9DceGOvlNZML67x8YxRsk= X-Received: by 2002:aca:2b0d:: with SMTP id i13mr1671714oik.39.1592456600830; Wed, 17 Jun 2020 22:03:20 -0700 (PDT) MIME-Version: 1.0 References: <64d477b6-7bf2-fa0d-b9b4-821285af386e@gmail.com> <2ee11a3c-9785-fa29-afa4-9eeac2248972@gmail.com> In-Reply-To: <2ee11a3c-9785-fa29-afa4-9eeac2248972@gmail.com> From: Elijah Newren Date: Wed, 17 Jun 2020 22:03:09 -0700 Message-ID: Subject: Re: [PATCH 00/10] [RFC] In-tree sparse-checkout definitions To: Derrick Stolee Cc: Derrick Stolee via GitGitGadget , Git Mailing List , Jeff King , Taylor Blau , Jonathan Nieder , Derrick Stolee Content-Type: text/plain; charset="UTF-8" Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Jun 17, 2020 at 8:01 PM Derrick Stolee wrote: > > On 6/17/2020 9:59 PM, Elijah Newren wrote: > > On Wed, Jun 17, 2020 at 6:42 PM Derrick Stolee wrote: > >> > >> On 6/17/2020 7:14 PM, Elijah Newren wrote: > >> You mentioned in another thread that it is a bit unwieldy for a user > >> to rely on a committed (or staged?) file, so adding the ability to > >> check the working directory first is interesting. I wonder how the > >> timing comes into play when changing HEAD to a new commit? Seems > >> tricky, but solvable. > > > > Isn't that essentially the same timing issue that comes into play if > > you only look at the index, and are changing HEAD to a new commit? > > It adds to the complexity. We can inspect the new index for the > in-tree definition and update the skip-worktree bits before actually > changing the working directory to match the new index. However, if > we trust the working directory copy over the index copy, then we > need to also decide if the working directory copy _would_ change > in this move before using it. > > Of course, maybe I'm making it over-complicated. I still know so > little about the index. I got into this feature due to a simple > pattern-matching problem that I could understand, but now I'm > getting lost in index states and dirty statuses. ;) Honestly, I think your first cut for switching branches with this new feature should just be: 1) Do a switch/checkout exactly the way it's done today already: 1a) Load the index and existing sparsity rules (from worktree then index), according to what existed before the branch switch 1b) Do the appropriate 1-way or 2-way merge from unpack_trees() to move to the right branch (as builtin/checkout.c already does) 2) *If* using in-tree sparsity feature, re-load the sparsity rules (from worktree then index) and run the equivalent of `sparse-checkout reapply` Not only does this avoid messing up any code for the normal case, I'm pretty sure this gets the behavior the user expects in all the special cases. There is one big downside, and a little downside. The big downside is that it'll have two progress meters instead of just one (much like sparse-checkout init && sparse-checkout set do today). The little downside is that this isn't optimal from a performance standpoint, for two reasons: (a) some files may be updated in the working tree in step 1b despite the fact that they'll be removed in step 2. (b) step 2 may just be an expensive no-op, in fact I suspect it will be most the time The reason I think the performance isn't a big deal is because: C) the fact that (b) is a problem means (a) is not often an issue -- the sparsity patterns are usually the same. D) Even if the sparsity patterns differ, it's often the case that files within the tree won't change (people tend to only modify a few files per commit, after all, so even switching branches only updates a subset of all files). E) Even if the sparsity patterns differ and files differ and have to be updated, it's still proportional at most to the number of files selected by the sparsity patterns, so it won't feel excessive or slow to the user. F) `sparse-checkout reapply` is pretty cheap, especially if called as a function instead of as a subprocess In my opinion, other than the unfortunate dual progress meters, the only place where this new feature gets hairy is that `git sparse-checkout reapply` (or its libified variant) now needs to know how to "read your new form of sparsity rules". That sounds easy at first but in my opinion it is a bit of a mess. The user could have written garbage to the file and I don't know what to do with garbage, but I think we have to do something sane. (Give up and checkout everything? Give up and checkout nothing but the toplevel directory? Ignore each line we don't understand and treat the rest as stuff that needs to be checked out? Can that last one even be done with the .ini reader?) I don't think saying "well, tough luck the user should know better" works because we already know that merges or rebases or checkout -m could have written garbage to those files (in the form of conflict markers), and users will say that *we* were the ones that filled the file with garbage. So we have to handle it intelligently in some fashion. Thoughts?