This guide will show you how to backup your dev projects to Dropbox or Google Drive (or any service really) while ignoring your massive node_modules directories. It leverages the rsync command line utility. It works on Mac, Linux, and hypothetically Windows.
Why Would You Want To Do This?
You should already be using Git with most projects and should periodically be pushing changes up to a remote. However, what if you’ve coded for two days, you haven’t pushed any changes, and your laptop blows up or gets stolen? You’ve lost two days of work. Or what if you simply want to hack away on a pet project that doesn’t need version control? My point is, version control should not be confused with a backup system.
You might spring to use Dropbox or Google Drive, but you’ll soon notice the endless scanning of node_modules directories and the steep jump in used storage space. This is because your JavaScript projects’ node_modules directories take up massive amounts of space, sometimes in the 100s of megabytes. None of this code needs to be backed up either. It’s code you haven’t personally written that can be repopulated my running npm install
. We want to somehow exclude these directories.
We might also want to exclude other directories that take up a lot of space like .git
directories.
It’s Not Possible Out-of-the-Box
Both Dropbox and Google Drive allow you exclude specific directories. It’s called selective sync on Dropbox and it’s also possible on Google Drive. However, you must manually hand-pick each node_modules directory for exclusion. If you rename a project or add a new one, you’ll need to remember to change your settings.
It would be great if it were possible to ignore all files/directories that match a specific name. This has been a highly requested feature for years, but unfortunately still hasn’t been implemented.
Existing Solutions
- Use a different syncing service that does support pattern matching, like Resilio or Seafile. However, do you really want to run a new service alongside your existing Dropbox or Google Drive installation?
- Programmatically instruct Dropbox to ignore certain files via scripts like DropboxIgnore or .dbignore. However, these seem somewhat hacked together or don’t support the latest versions of Dropbox, nor Google Drive at all.
- Use selective sync to manually exclude node_modules for each project (as mentioned above).
Our Rsync Solution
I attempted a number of solutions that didn’t quite work. However, I landed on a rather elegant solution that achieves all I wanted and more. In summary:
- You must have a single directory that contains all your coding projects and it must live outside your Dropbox / Google Drive folder. I keep mine in
$HOME/Projects
- We will use rsync to copy the files from our Projects directory INTO our Dropbox / Google Drive folder, for example
$HOME/Dropbox/Projects-Backup
. We will configure rsync to exclude certain files from the copy. - We will make this execute at a certain interval (every 10 minutes) by using cron. Rsync will be efficient and quick to execute if very little has changed.
- Every time the cron job executes, your changes will be synced to the cloud.
Pros
- Works with any syncing service, not just Dropbox. As a result, you won’t need to run another specialized syncing client.
- You can ignore the same files that your .gitignore does. If your .gitignore is configured to ignore generated dist files or other vendor files, those will be ignored too. Pretty neat!
- Allows you to put your projects directory outside of your Dropbox or Google Drive directories, which feels more natural. This was previously impossible with Google Drive and only possible with Dropbox using symlinks.
Cons
- Requires a bit of scripting. But you might prefer this over a GUI. Either way, this guide will hold your hand.
- Won’t sync immediately as files are changed. However, you can configure syncing to happen at a very frequent interval of time and it will still be efficient.
- It mirrors a second copy of the relevant files on your same local filesystem, wasting a bit of space.
- It won’t do two-way mirroring. For example, you can’t receive new changes from Dropbox users on other computers.
Will It Work on Windows?
Our technique relies on bash scripting and cron jobs, which Windows does not natively support. However, you can simulate a POSIX environment on Windows using Cygwin and it seems you can run cron jobs by following this article. You might be able to use Windows Task Scheduler instead of cron.
If you’ve attempted this on Windows or have any insight, please comment on this post!
Step 1) Confirm you have rsync
MacOS and most Linux distros should come preloaded with the rsync
binary. Check that you have it by opening up a command prompt and running
which rsync
It should print out something like /usr/bin/rsync
.
Step 2) Write the Sync Script
Create a new bash script with a .sh
extension:
#!/usr/bin/env bash
set -e # always immediately exit upon error
# directory config. ending slashes are important!
src_dir="$HOME/Projects/"
dest_dir="$HOME/Dropbox/Projects-Backup/"
# run the sync
rsync -ar --delete \
--filter=':- .gitignore' \
--exclude='node_modules' \
--exclude='.git' \
--exclude='.DS_Store' \
--chmod='F-w' \
"$src_dir" "$dest_dir"
Make sure it has execution permission. You can run chmod +x <path-to-file>
to do this.
Change the src_dir
and dest_dir
to the directories you’d like. Make sure to include trailing slashes! Our rsync setup needs this.
Please add or remove the --exclude
args to your liking.
The a
flags is a compound flag telling rsync to do a bunch of things that you don’t need to worry about. The r
flags tells rsync to work recursively. The --delete
flag tells rsync to delete files in dest_dir
when they no longer exist in src_dir
. I always do this because I don’t want dest_dir
to accumulate old deleted files. If I want access to old deleted files, I can use Dropbox’s or Google Drive’s history.
The --filter=':- .gitignore'
portion of the rsync command is really neat. It tells rsync to
exclude the files that are listed in the .gitignore
file in each directory. These are files you already don’t care about, such as vendor files or temp files, so you won’t be forced to create another list!
The --chmod='F-w'
portion tells rsync that the copied files (but not directories) should have their write permissions removed. This is a good idea because it prevents us from accidentally going into the dest_dir
instead of the authoritative src_dir
and making edits. We unfortunately can’t do this for directories because we need to allow rsync to add new files and delete old ones.
Step 3) Test the Sync Script
Let’s ensure the syncing script works. Open a prompt and execute it! You should see a newly created dest_dir
that excludes all the node_modules directories and whatnot.
Try running it a second time and notice how fast it runs.
Step 4) Install the Cron Job
Next, execute crontab -e
to open up a text editor for editing your cron tasks. Add the line:
*/10 * * * * <path-to-sync-script>
This runs our sync script every 10 minutes. Change the 10 to a 5 to run it every 5 minutes for example. And of course make sure to write the absolute path to <path-to-sync-script>
.
If you want to programmatic install a cron with a rerunnable script, see this solution.
Step 5) Test the Cron Job
Verify the cron job was saved by running crontab -l
to list installed jobs.
After installing the cron, try to make some modifications within your projects directory. Then, after the current 10-minute interval of time ends (if it’s 4:32, wait until 4:40), watch your files sync!
Google Drive on Mac
Dropbox on Ubuntu
Congrats!
Appendix: Unsuccessful Attempts
Initially, did not enjoy the fact that this technique executes periodically on a timer as opposed to immediately after modification. I attempted to use lsyncd, essentially a wrapper around rsync that responds to filesystem changes, but then realized that Mac isn’t really supported.
I then attempted to use fswatch to trigger the rsync command but had lots of troubles. First, I could not figure out how to introduce a delay or some other mechanism to prevent the syncing from happening too frequently/concurrently. Also, I had troubles installing it as a daemon on Mac. I eventually gave up and resorted to a 10-minute cron job, which works great.
A completely different approach I attempted was to mount a read-only virtual volume that mirrored our target directory but configure it to exclude certain files. I would either mount it within the Dropbox / Google Drive folder or mount it elsewhere and symlink it (in the case of Dropbox). I attempted this with rofs-filtered but Dropbox would randomly hang or not pick up on change events. I should have expected this because Dropbox lists only certain supported filesystems and FUSE is not one of them.
There’s another solution with a commercial program called GoodSync which is described in this article, though I discovered it too late in the process to try it out personally.
Comments