Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Thursday, January 18, 2024

Recovering Deleted Files From Your Git Working Tree

 Git is designed to assure us, as developers, that we can track a project’s files at different points in time. But what it doesn’t assure us is that those files are always safe along the way. For those of you who have dealt with the sinking feeling that you’ve irrevocably deleted and lost files, Sanmi Akande has a couple of approaches that, in the right situations, may help bring them back.

There are times when mistakes happen, and useful and important files are deleted by error or lost from your file system irrevocably (or seemingly, at least). Version control systems make it difficult to permanently lose files, provided they have been either added to staging or committed to a remote repository, because Git allows you to undo or revert changes and access previous versions of the saved files.

It is also possible to erroneously erase files from both the working directory and the Git repository. I’ve certainly done that! I imagine you have, too, if you’re reading this, and if that’s the case, then you will need a way to recover those files.

I have a few methods and strategies you can use to recover your deleted files. Some are more obvious than others, and some are designed for very specific situations. And while it is indeed possible to irrevocably lose a file, even then, you may have a path to at least recover a copy of it with third-party software if it comes to that.

How Git Works With Files #

Before we dive into all of that, let’s explore how your files journey from your local computer to your remote repository.

Your files are initially only located on your computer’s storage, known as your working tree or working directory, and Git has no idea they exist yet. At this point, they are at their most vulnerable state since they are untracked.

Adding files to the staging area — also known as the index — so that Git is aware of them is what the git add <filename> (or git add -A for all files) is for. What actually happens under the hood when pushing files to staging is that Git hashes the content and creates a blob for each file based on the file’s content and proceeds to store them in the /objects subdirectory located at .git/objects. Run git status to confirm that the files you want to commit have been added to your staging area.

An illustration showing how Git hashes files and uses the hash to form blob in the .Git Content
Git Hashing and storing files as a blob in the .git/objects directory. (Large preview)

Once the files are staged, Git is at least aware of them, and we can include them in commits. When including a file in a commit, Git creates a new tree object to represent the state of the repository at the time the commit happens. The tree object contains the following information:

  • SHA-1 hash of the tree object that represents the state of the repository;
  • SHA-1 hash of the commit’s parent commit object if it has a parent;
  • Author and committer information;
  • Commit message.

It’s at this point that the files are git push-ed to the remote repo, wherever you happen to be hosting it, whether it’s GitHub, Beanstalk, Bitbucket, or whatever.

Illustration of the file(s)’s journey from a local computer to a remote repo
Your file(s)’s journey from your local computer to your remote repo. (Large preview)

How Files Can Get Deleted From A Working Tree #

So, the key pieces we’re talking about are your project’s working tree, staging area and commit. It is possible for files to be deleted at any one of these points, but it’s the working tree where it is most irreversible, or at least tough, to restore a lost file.

There are some very specific Git commands or actions that tend to be the biggest culprits when a file is deleted from the working tree.

git rm #

I’m sure you have seen this one before. It’s a command for removing (rm) files from the working tree. It might be the most commonly used command for deleting files.

git reset #

Anytime a reset happens, it’s very possible to lose any files you’ve been working on. But there are two types of Git resets that make this possible:

  1. git reset --hard
    This command is sort of a nuclear path for resetting a working tree and the staging area. If you’ve made any changes to tracked files, those will be lost. That goes for commits, too, which are discarded altogether. In fact, any files or directories that are not in the HEAD commit are removed from the working tree.
  2. git reset <filename>
    This is a lot less damaging than a hard reset, but it does indeed remove the specified file from the working tree. But it’s worth mentioning that the file is not pulled out from the staging area. So there’s a path back, which we’ll get to.

git clean #

This removes untracked files from the working tree. Untracked files are not in the Git staging area and are not really part of the repository. They’re typically temporary files or files that have not yet been added to the repository.

One key distinction with a clean command is that it will not remove files that are included in a project’s .gitignore file, nor will it remove files that have been added to the staging area, nor ones that have already been committed. This can be useful for cleaning up your working tree after you have finished working on a project and you want to remove all of the temporary files that you created.

Like git reset, there are different variations of git clean that remove files in different ways:

  • git clean <filename>
    Used to remove specific files from the working tree.
  • git clean -d
    Removes untracked files from a specific directory.
  • git clean -i
    This one interactively removes files from the working tree. And by that, I mean you will be prompted to confirm removal before it happens, which is a nice safeguard against accidents.
  • git clean -n
    This is a dry run option and will show you the files that would be removed if you were to run the original git clean command. In other words, it doesn’t actually remove anything but lets you know what would be removed if you were to run an actual clean.
  • git clean -f
    This one forces the git clean command to remove all untracked files from the working tree, even if they are ignored by the .gitignore file. It’s pretty heavy-handed.
  • git clean -f -d
    Running this command is a lot like git clean --f but wipes out directories as well.
  • git clean -x
    This removes all untracked files, including build products. It is best used when you want to wipe your working tree clean and test a fresh build.
  • git clean -X
    This only removes files ignored by git.

Of course, I’m merely summarizing what you can already find in Git’s documentation. That’s where you can get the best information about the specific details and nuances of git clean and its variants.

Manually Removing Files #

Yes, it’s possible! You can manually delete the files and directories from your working tree using your computer’s file manager. The good news, however, is that this will not remove the files from the staging area. Also, it’s quite possible you can undo that action with a simple CMD + Z/CTRL + Z if no other action has happened.

It is important to note that manually removing files from the working tree is a destructive operation. Once you have removed a file from the working tree that has not been added to a commit, it is almost impossible to undo the operation completely from a Git perspective. As a result, it is crucial to make sure that you really want to remove a file before you go this route.

But mistakes happen! So, let’s look at a variety of commands, strategies, and — if needed — apps that could reasonably recover deleted files from a working directory.

How Files Can Be Recovered After Being Deleted #

Git commands like git checkout, git reset, git restore, and git reflog can be helpful for restoring files that you have either previously added to the staging area or committed to your repository.

git checkout #

If you have not committed the changes that deleted the files and directories, then you can use the git checkout command to checkout a previous commit, branch, or tag. This will overwrite the working tree with the contents of the specific commit, branch, or tag, and any deleted files and directories will be restored.

git checkout HEAD~ <filename>

That will take things back to the last commit that was made. But let’s say you’ve made several commits since the file was deleted. If that’s the case, try checking out a specific commit by providing that commit’s hash:

git checkout <commit-hash> <filename>

Oh, you’re not sure which file it is, or there are more files than you want to type out? You can check out the entire working tree by committing the filename:

git checkout <commit-hash>

git reset #

If you have committed the changes that deleted the files and directories, then you can use the git reset command to reset the HEAD pointer to a previous commit. This will also overwrite the working tree with the contents of the specific commit, and any deleted files and directories will be restored in the process.

git reset <commit-hash>

git restore #

If you want to restore deleted files and directories without overwriting the working tree, then you can use the git restore command. This command restores files and directories deleted from the staging area or the working tree. Note that it only works for tracked files, meaning that any files that weren’t git add-ed to the working tree are excluded.

git restore --staged <filename>

To jump back one commit, you could go back to the --worktree instead of the staging area:

git restore --worktree <filename>

And, of course, leave out the filename if you want to restore all files in the working tree from the previous commit:

git restore --worktree

Another option is to restore all of the files in the current directory:

git restore .

git reflog #

There’s also the git reflog command, which shows a history of all recent HEAD movements. I like this as a way to identify the commit that you want to checkout or reset to.

git reflog

Last Resorts #

When files that are neither present in the staging area nor committed are deleted from the working tree, it is commonly accepted that those files are gone forever — or oti lor as we say in Yoruba — without any hope of recovery. So, if for any reason or by error, you delete important files from your project’s working tree without ensuring that they are either in the staging area or have been previously committed, then you may be thinking all hope of getting them back is lost.

But I can assure you, based on my experiences in this situation, that it is usually possible to recover all or most of a project’s lost files. There are two approaches I normally take.

File Recovery Apps #

File recovery tools can recover lost or deleted data from your storage devices. They work by running a deep scan of your device in an attempt to find every file and folder that has ever existed on your storage device, including deleted and lost files and folders. Once the files have all been found, you can then use the data recovery tool to restore/recover the files of your choice to a new location.

Note: Some of the deleted and lost files found may be corrupted and damaged or not found at all, but I am certain from my experience using them that the majority will be found without any corruption or damage.

There are a variety of file recovery tools available, and the “right” one is largely a subjective matter. I could spend an entire post exclusively on the various options, but I’ve selected a few that I have used and feel comfortable at least suggesting as options to look into.

Wondershare Recoverit is capable of recovering more than 1,000 file formats. Its free tier option allows you to run a scan to find files on your computer’s storage, but to actually recover the files, you will have to do a paid upgrade to one of its paid plans starting at a $69.99 annual subscription or a one-time $119.99 license. There’s a premium plan for more enhanced recovery methods for things like videos and files, as well as fixing corrupted files that go well beyond the basic need of recovering a single lost file.

  • Pros: High success rate, free tech support, allows partition recovery.
  • Cons: Free tier is extremely limited.

EaseUS Data Recovery Wizard is perhaps one of the most popular tools out of what’s available. Its free tier option is quite robust, running a deep scan and recovering up to 2GB of data. The difference between that and its paid subscription (starting at $119.95 per year, $169.95 lifetime) is that the paid tier recovers an unlimited amount of data.

  • Pros: Fast deep scans, file preview before recovery, easy to use, generous free tier.
  • Cons: Paid plans are significantly more expensive than other tools, Windows and macOS versions are vastly different, and the macOS software is even more expensive.

DM Disk Editor (DMDE) makes use of a special algorithm that reconstructs directory structures and recovers files by their file signature when recovering solely by the file system proves impossible. DMDE also offers a free tier option, but it is quite limited as you can only recover files from the directory you have selected, and it only recovers up to 4,000 files at a time. Compare that to its paid versions that allow unlimited and unrestricted data recovery. Paid plans start at $20 per year but scale up to $133 per year for more advanced needs that are likely beyond the scope of what you need.

  • Pros: High recovery success rate, generous free tier, reasonable paid tiers if needed.
  • Cons: I personally find the UI to be more difficult to navigate than other apps.
SoftwareOperating Systems supportedStarting priceFile types and formats supported
Wondershare RecoveritWindows, Mac, Linux(Premium)$69.99/year1000+ file types and formats
EaseUSWindows, Mac$99.95/year (Windows), $119.95/year (Mac)1000+ file types and formats
DMDEWindows, Mac, Linux, DOS$20/yearSupports basic file formats. Does not support raw photo files.

As I said, there are many, many more options out there. If you’re reading this and have a favorite app that you use to recover lost files, then please share it in the comments. The more, the merrier!

Last Resort: git fsck #

First off, the git fsck command can be dangerous if used incorrectly. It is essential to make sure that you understand how to use the command before using it to recover files from the working tree. If you are unsure how to proceed after reading this section, then it is a good idea to consult the Git documentation for additional details on how it is used and when it is best to use it.

That said, git fsck can indeed recover files lost from the working tree in Git and maybe your absolute last resort. It works by scanning the Git repository for “dangling” objects, which are objects that are not referenced by any commit. The Git docs define it like this:

dangling object:

“An unreachable object that is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository.”

This can happen if a file is deleted from the working tree but not committed or if a branch is deleted, but the files on the branch are not deleted.

To recover files lost from the working tree using the git fsck command, follow these steps:

  • Run git fsck –lost-found, which is a special mode of the git fsck command.
    It creates a directory called .git/lost-found and moves all of the lost objects to that directory. The lost objects are organized into two subdirectories: commits and objects. The /commits subdirectory contains lost commits, and the /objects subdirectory contains lost blobs, trees, and tags. This command prints the dangling objects (blobs, commits, trees, and tags) if they exist.
Shows output from the git fsck –lost-found command
The command returns dangling blobs for files I haven’t committed to the repository. Each blog represents a file we can recover. (Large preview)
  • Run the git show <dangling_object_hash> command for each dangling object that is printed.
    This will print the content of the object and enable you to see the original content of the hashed object so you can identify the dangling objects in the case of files dangling blobs that correspond to the files that you want to recover.
  • To recover a dangling object, you can manually copy the content of the printed in the console when you run the git show <dangling_object_hash> command or run git show <dangling_object_hash> > <filename> command to save the content of the hashed object to the file you specified in the command. You can also use the git checkout <dangling_object_hash> command to restore the file to the working tree.

Once you have recovered the files that you want to recover, you can commit the changes to the Git repository as if nothing ever happened. Phew! But again, I only advise this approach if you’ve tried everything else and are absolutely at your last resort.

Conclusion #

Now that you know how to recover files lost from your working tree, your mind should be relatively at ease whenever or if ever you find yourself in this unfortunate situation. Remember, there’s a good chance to recover a file that may have been accidentally deleted from a project.

That said, a better plan is to prevent being in this situation in the first place. Here are some tips that will help you prevent ending up almost irrevocably losing files from your working tree:

  • Commit your files to your Git repository and remote servers as quickly and as often as you create or make changes to them.
    There is no such thing as a “too small” commit.
  • Routinely create backups of your project files.
    This will help you recover your files if you accidentally delete them or your computer crashes.

No comments:

Post a Comment