Setting up a local mirror for git repositories

This post explains how to mirror your repositories from Github, Gitlab, Bitbucket, etc

soft serve example

GitHub is great! Everyone loves GitHub, everyone uses GitHub, and everyone’s coding path is most certaintly hosted on GitHub.

Unless GitHub decides that it doesn’t want your repository (or your account) due to reasons. And as many people learned “the hard way”, you should never trust corporations with data you’ll prefer not to lose forever, just because Google is mad at you and sent DMCA takedown request, or because your passport color is now not so socially acceptable in comparison to yesterday other reasons that are out of scope of this blog post.

Having a local copies of your repos is nice, but keeping all of them in sync is not so nice. And if you always buy the cheapest macbooks (like me), you may not have the required disk space to keep a carbon copies of all your projects.

Requirements

  • Low resource usage: I don’t need a whole GitLab running at home, why would anyone need a whole GitLab running at home?
  • Run and forget: No complicated deploy, no database servers, no memcache, nothing that can break
  • Syncs changes automatically: I don’t want to interact with it at all. After repo is added, changes should arrive by itself
  • No CI pipelines: It would be a madness to setup CI for 100+ repos, spread between 3 hosted git repos providers
  • Selfhostable: You sure can just import your repos to another cloud provider, but why?

Soft Serve

Soft Serve 🔗 is a minimalist self-hosted git server, controlled entirely via ssh.

It has a nice TUI to browse your repos, and a surprising amount of features. It also doesn’t require any external databases, sips memory, and pulls changes by itself every 10 minutes.

There are several ways to run it, I’m using Docker container running on my TrueNAS Scale home server, but you can choose a variant of your liking. I would not put a whole installation guide for Soft Serve, but I’ll note a couple of caveats:

SSH key

When you specify ssh key for admin access, there is some caveats 🔗. You’ll need a separate ed25519 key, and you can generate one using this command:

Terminal window
$ ssh-keygen -t ed25519 -C "your@email"

SSH config for easy access

Here’s my ~/.ssh/config entry for Soft Serve:

Host soft-serve
HostName 192.168.2.3
Port 23231
IdentityFile ~/.ssh/id_ed25519

Now you can access it with ssh soft-serve command

Access control

If you running it on a public server, make sure to limit public access 🔗.

By default connecting users will have readonly access to all the public repos. I’m running it inside a home network, but decided to keep private repos private even there.

You can verify how it would look like for an anonymous user with this command:

Terminal window
$ ssh your-soft-serve-ip -p 23231 -i /dev/null

Note: Do not use alias from ssh config, because it will still use the provided key, despite overriding key using -i argument. Enter the original IP and port manually.

Accessing private repos

For mirroring private repos, you’ll need to get client key from the soft-serve.

Use this command to get the public key of soft serve’s git client:

Terminal window
$ sudo cat /your-soft-serve-volume-location/ssh/soft_serve_client_ed25519.pub

Note: sudo may be required here due to permissions on ssh folder

Then you’ll need to add this key as a deploy key to your private repos (preferred way) or as an key for your user (note that it would give read-write access to all your repositories).

Naming of the repositories

Because I need to import repositories from the different git hostings, I’ve decided to name my repositories like this:

website.com/username/reponame

Adding repos

To manually import repository, you can use this command:

Terminal window
$ ssh soft-serve repo import some/namespace/repo-name address

This command will create a public repo, imported from the address.

There is several modificators, important for our usecase:

  • -p makes repository private
  • -m flags repository as a mirror, that will be updated regularly.

I’ve needed to import 100+ repositories, so I’ve made a couple of commands that will simplify that task. Feel free to modify these commands to your liking!

Github

There is different categories of github repos:

  • Public
  • Public Archive
  • Private
  • Forks

And for all of these I’ll have different approaches:

  • Public: import + mirror (sync changes periodically)
  • Public Archive: import once, but do not sync changes
  • Private: import + mirror + private
  • Forks: ignore

At first, go to github.com/settings/tokens 🔗 and create a personal access token (classic one) with a repo permissions. Shortest expiration is fine, because this operation is one time only.

Export your token as an env variable:

Terminal window
$ export GITHUB_TOKEN=<YOUR_GITHUB_TOKEN>

Prepare the import commands:

Terminal window
$ curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/user/repos\?per_page\=100\&page\=1\&affiliation\=owner \
| jq -r '
.[] |
select(.fork == false) |
[
"ssh soft-serve repo import",
"github.com/" + .full_name,
.ssh_url,
if .private then "-p" else "" end,
if .archived == false then "-m" else "" end
] |
map(select(length > 0)) |
join(" ")
'

This command will:

  • Fetch the first 100 repos owned by your user
  • Filter out all the forks
  • Flag private repos as private
  • Flag non-archived repos as mirrors

If you need to import public repos via https, replace .ssh_url with if .private then .ssh_url else .clone_url end.

Increment page number if you have more than 100 total repositories.

Gitlab

The repo categories is somewhat the same as on Github.

Create a personal access token with read_api permissions and short expiration date.

Export your token as an env variable:

Terminal window
$ export GITLAB_TOKEN=<YOUR_GITLAB_TOKEN>

Prepare the import commands:

Terminal window
$ curl -L \
-H "Accept: application/json" \
-H "Authorization: Bearer $GITLAB_TOKEN" \
https://gitlab.com/api/v4/projects\?owned\=true\&per_page\=100 \
| jq -r '
.[] |
select(.empty_repo == false) |
[
"ssh soft-serve repo import",
"gitlab.com/" + .path_with_namespace,
.ssh_url_to_repo,
if .visibility == "private" then "-p" else "" end,
if .archived == false then "-m" else "" end
] |
map(select(length > 0)) |
join(" ")
'

Replace gitlab.com with your gitlab instance hostname.

This command will:

  • Fetch the first 100 repos owned by your user (including groups)
  • Filter out all the empty repos
  • Flag private repos as private
  • Flag non-archived repos as mirrors

If you need to import public repos via https, replace .ssh_url_to_repo with if .visibility == "private" then .ssh_url_to_repo else .http_url_to_repo end.

Gitlab implements keyset pagination, so good luck dealing with that :)

Bitbucket

Bitbucket does not have archives, so we always mark repositories as a mirror here.

Create an “App Password” (don’t forget to remove it after import is done) in “Personal Settings” with the repositories permission.

Then export it:

Terminal window
$ export BITBUCKET_CREDENTIALS=<your-username>:<app-password>

And here’s the command for preparing list of imports:

Terminal window
$ curl -L \
-H "Accept: application/json" \
--user $BITBUCKET_CREDENTIALS \
https://api.bitbucket.org/2.0/repositories\?role\=owner\&pagelen\=100 \
| jq -r '
.values |
.[] |
[
"ssh soft-serve repo import",
"bitbucket.org/" + .full_name,
(.links.clone[] | select(.name == "ssh") | .href),
if .is_private then "-p" else "" end,
"-m"
] |
map(select(length > 0)) |
join(" ")
'

This command:

  • Fetches 100 repositories owned by your user
  • Makes private repos private in soft-serve

If you need to import public repos via https, replace (.links.clone[] | select(.name == "ssh") | .href) with if .is_private then (.links.clone[] | select(.name == "ssh") | .href) else (.links.clone[] | select(.name == "https") | .href) end.

Licensed under CC BY-NC 4.0

Comments