Setting up a local mirror for git repositories
This post explains how to mirror your repositories from Github, Gitlab, Bitbucket, etc- Aug 5, 2023
- 6 min read
- #guides
- #git
- #selfhosted

GitHub is great! Everyone loves GitHub, everyone uses GitHub, and everyone’s coding path is most certaintly hosted on GitHub.
Unless GitHub decides that it doesn’t want your repository (or your account) due to reasons. And as many people learned “the hard way”, you should never trust corporations with data you’ll prefer not to lose forever, just because Google is mad at you and sent DMCA takedown request, or because your passport color is now not so socially acceptable in comparison to yesterday other reasons that are out of scope of this blog post.
Having a local copies of your repos is nice, but keeping all of them in sync is not so nice. And if you always buy the cheapest macbooks (like me), you may not have the required disk space to keep a carbon copies of all your projects.
Requirements
- Low resource usage: I don’t need a whole GitLab running at home, why would anyone need a whole GitLab running at home?
- Run and forget: No complicated deploy, no database servers, no memcache, nothing that can break
- Syncs changes automatically: I don’t want to interact with it at all. After repo is added, changes should arrive by itself
- No CI pipelines: It would be a madness to setup CI for 100+ repos, spread between 3 hosted git repos providers
- Selfhostable: You sure can just import your repos to another cloud provider, but why?
Soft Serve
Soft Serve 🔗 is a minimalist self-hosted git server, controlled entirely via ssh.
It has a nice TUI to browse your repos, and a surprising amount of features. It also doesn’t require any external databases, sips memory, and pulls changes by itself every 10 minutes.
There are several ways to run it, I’m using Docker container running on my TrueNAS Scale home server, but you can choose a variant of your liking. I would not put a whole installation guide for Soft Serve, but I’ll note a couple of caveats:
SSH key
When you specify ssh key for admin access, there is some caveats 🔗. You’ll need a separate ed25519 key, and you can generate one using this command:
$ ssh-keygen -t ed25519 -C "your@email"SSH config for easy access
Here’s my ~/.ssh/config entry for Soft Serve:
Host soft-serve HostName 192.168.2.3
Port 23231
IdentityFile ~/.ssh/id_ed25519Now you can access it with ssh soft-serve command
Access control
If you running it on a public server, make sure to limit public access 🔗.
By default connecting users will have readonly access to all the public repos. I’m running it inside a home network, but decided to keep private repos private even there.
You can verify how it would look like for an anonymous user with this command:
$ ssh your-soft-serve-ip -p 23231 -i /dev/nullNote: Do not use alias from ssh config, because it will still use the provided key, despite overriding key using -i argument. Enter the original IP and port manually.
Accessing private repos
For mirroring private repos, you’ll need to get client key from the soft-serve.
Use this command to get the public key of soft serve’s git client:
$ sudo cat /your-soft-serve-volume-location/ssh/soft_serve_client_ed25519.pubNote: sudo may be required here due to permissions on ssh folder
Then you’ll need to add this key as a deploy key to your private repos (preferred way) or as an key for your user (note that it would give read-write access to all your repositories).
Naming of the repositories
Because I need to import repositories from the different git hostings, I’ve decided to name my repositories like this:
website.com/username/reponame
Adding repos
To manually import repository, you can use this command:
$ ssh soft-serve repo import some/namespace/repo-name addressThis command will create a public repo, imported from the address.
There is several modificators, important for our usecase:
-pmakes repository private-mflags repository as a mirror, that will be updated regularly.
I’ve needed to import 100+ repositories, so I’ve made a couple of commands that will simplify that task. Feel free to modify these commands to your liking!
Github
There is different categories of github repos:
- Public
- Public Archive
- Private
- Forks
And for all of these I’ll have different approaches:
- Public: import + mirror (sync changes periodically)
- Public Archive: import once, but do not sync changes
- Private: import + mirror + private
- Forks: ignore
At first, go to github.com/settings/tokens 🔗 and create a personal access token (classic one) with a repo permissions. Shortest expiration is fine, because this operation is one time only.
Export your token as an env variable:
$ export GITHUB_TOKEN=<YOUR_GITHUB_TOKEN>Prepare the import commands:
$ curl -L \ -H "Accept: application/vnd.github+json" \ -H "Authorization: Bearer $GITHUB_TOKEN" \ -H "X-GitHub-Api-Version: 2022-11-28" \ https://api.github.com/user/repos\?per_page\=100\&page\=1\&affiliation\=owner \ | jq -r ' .[] | select(.fork == false) | [ "ssh soft-serve repo import", "github.com/" + .full_name, .ssh_url, if .private then "-p" else "" end, if .archived == false then "-m" else "" end ] | map(select(length > 0)) | join(" ") 'This command will:
- Fetch the first 100 repos owned by your user
- Filter out all the forks
- Flag private repos as private
- Flag non-archived repos as mirrors
If you need to import public repos via https, replace .ssh_url with if .private then .ssh_url else .clone_url end.
Increment page number if you have more than 100 total repositories.
Gitlab
The repo categories is somewhat the same as on Github.
Create a personal access token with read_api permissions and short expiration date.
Export your token as an env variable:
$ export GITLAB_TOKEN=<YOUR_GITLAB_TOKEN>Prepare the import commands:
$ curl -L \ -H "Accept: application/json" \ -H "Authorization: Bearer $GITLAB_TOKEN" \ https://gitlab.com/api/v4/projects\?owned\=true\&per_page\=100 \ | jq -r ' .[] | select(.empty_repo == false) | [ "ssh soft-serve repo import", "gitlab.com/" + .path_with_namespace, .ssh_url_to_repo, if .visibility == "private" then "-p" else "" end, if .archived == false then "-m" else "" end ] | map(select(length > 0)) | join(" ") 'Replace gitlab.com with your gitlab instance hostname.
This command will:
- Fetch the first 100 repos owned by your user (including groups)
- Filter out all the empty repos
- Flag private repos as private
- Flag non-archived repos as mirrors
If you need to import public repos via https, replace .ssh_url_to_repo with if .visibility == "private" then .ssh_url_to_repo else .http_url_to_repo end.
Gitlab implements keyset pagination, so good luck dealing with that :)
Bitbucket
Bitbucket does not have archives, so we always mark repositories as a mirror here.
Create an “App Password” (don’t forget to remove it after import is done) in “Personal Settings” with the repositories permission.
Then export it:
$ export BITBUCKET_CREDENTIALS=<your-username>:<app-password>And here’s the command for preparing list of imports:
$ curl -L \ -H "Accept: application/json" \ --user $BITBUCKET_CREDENTIALS \ https://api.bitbucket.org/2.0/repositories\?role\=owner\&pagelen\=100 \ | jq -r ' .values | .[] | [ "ssh soft-serve repo import", "bitbucket.org/" + .full_name, (.links.clone[] | select(.name == "ssh") | .href), if .is_private then "-p" else "" end, "-m" ] | map(select(length > 0)) | join(" ") 'This command:
- Fetches 100 repositories owned by your user
- Makes private repos private in soft-serve
If you need to import public repos via https, replace (.links.clone[] | select(.name == "ssh") | .href) with if .is_private then (.links.clone[] | select(.name == "ssh") | .href) else (.links.clone[] | select(.name == "https") | .href) end.
Comments