Show HN: My Private GitHub on Postgres

(github.com)

35 points | by calebhwin 8 hours ago ago

20 comments

hk1337 7 hours ago ago

Interesting idea but what's the use case for this? Why wouldn't I just create a private git server (gitlab, forgejo, etc) just for myself?

[-]
- somat 5 hours ago ago
  
  I suspect postgress just brings efficient queries. My initial thought was how fossil uses sqlite as a backing store. but... not only is sqlite intentionally designed as an interchange format(stable specification). the postgress disk structure is intentionally designed to not be a interchange format(they reserve the right to change it at any time) so not that.
  So the only real reason is you already have a postgres server and want the efficient query indexes.
  As an interesting side note. I found this document on the internal data structure of fossil. https://fossil-scm.org/home/doc/trunk/www/fossil-is-not-rela...
- tensegrist 4 hours ago ago
  
  as the other replies mention, efficient querying can be fun https://oseifert.ch/blog/building-pgit
- hungryhobbit 6 hours ago ago
  
  This seems like the elephant in the room.
  I'm not saying this project isn't cool, but whenever you have ANY software that's designed to be hosted A-style, and you host it B-style, the obvious question is "Why not host it the A way?"
xp84 7 hours ago ago

"doesn't support: ... Web UI."
So, it's a git server with an interesting storage layer? Don't get me wrong, that part sounds like it might have been a ton of work to implement, but I think the web UI (pull requests, etc) is a lot of what Github has won on historically.
Basically I don't feel qualified to judge the product itself, but I think positioning it against Github, while popular given the recent hard times, isn't quite correct.

[-]
- nomel 6 hours ago ago
  
  > "doesn't support: ... Web UI."
  "Doesn't" doesn't mean "can't". Someone just needs to do the work (with no thanks or pay expected).
  edit: the perspective of open source projects has really changed in the last 10 years, from collaboration to nice personal projects now being referred to as "the product".
  
  [-]
  - xp84 3 hours ago ago
    
    I apologize if that’s how my tone has come across. I think I just got distracted by the comparison. I think it’s very cool as a project.
lisperforlife 6 hours ago ago

This is really cool. PG has zlib compression on TOAST objects so this should still be okay even if you are not storing pack files. I am curious with your choice of hand-rolling pktline, upload-pack and receive pack implementations including rev-walking. Any particular reason you did not want to use libgit2 or something like the gitoxide implementation of pkt-line. Was it performance or is it because you wanted it to be in pure rust? Did you try running this on slightly heavier repository with a lot of commits, refs and objects?
supriyo-biswas 6 hours ago ago

I've always wanted to write something like this. The problem with Gitlab/Gitea etc. is their reliance on disk storage; which means self hosting them requires that I get the backup story just right. Whereas with this, I could just handle it as part of the database backup process.
Having no web UI, at least even a rudimentary one is kinda a bummer though.

[-]
- subhobroto 5 hours ago ago
  
  I've struggled with this decision myself but I came to the opposite conclusion as you:
  - Gitea's (I use Forgejo) reliance on disk storage for `.git` is perfect for me because files are well understood as a concept by most people. (To be clear, Gitea/Forgejo stores non `.git` artifacts in PostgreSQL.)
  Every battle hardened linux tool knows how to backup files. Plain old `rsync` can backup and restore files. I have heard people put their `.git` on something like Dropbox and have tit work both for sync and backup (I've never tried it myself).
  You can run checksums on files and ensure they are exactly how you expect them to be.
  There are multiple, well tested, well understood options to reliably backup, snapshot and restore files.
  Also, remote/cloud storage for files is really cheap. In most cases, if it's less than 10GB, you likely don't have to pay anything at all, as in $0 every month for having a backup on servers that won't go up in flames even if your laptop or house did.
  - OTOH, PostgreSQL backup and restore feels like they are less popular or accessible to the general population vs files' backup and restore.
  Infact, for non DBA folks who don't necessarily understand PostgreSQL WAL, backup snapshotting, what asynchronous and synchronous WAL replication means and how they affect RTO and RPO, there are definitely multiple and non-obvious ways to get more things wrong than right, and lose your data - something you wouldn't have to worry about when using files backup and restore.
  > Whereas with this, I could just handle it as part of the database backup process
  What's the database backup and restore process you follow right now and what are the tools you use?
throwatdem12311 7 hours ago ago

Just use Fossil at this point.

[-]
- sikozu 6 hours ago ago
  
  I'm waiting for somebody to create fossilhub
  
  [-]
  - somat 4 hours ago ago
    
    Already exists.
    There are fossil hosting sites, which is probably what you are talking about. I don't use one but here is an example. https://chiselapp.com/
    But fossil itself can already serve many projects acting like a self contained fossil hub.
- lagniappe 7 hours ago ago
  
  Fossil really has it all.
bitbasher 6 hours ago ago

No license?
iririririr 6 hours ago ago

just use ssh and git bare.
JasonHEIN 7 hours ago ago

Great idea
Mic92 8 hours ago ago

Nice idea.
vishal_ch 8 hours ago ago

Interesting approach using Postgres as the storage layer. Curious how you're handling the object model since Git's content-addressable storage maps pretty differently to relational tables. Are you storing blobs as bytea or going with something like a JSONB tree structure for the commit graph?

[-]
- munk-a 7 hours ago ago
  
  While git internally uses a pretty loose system for connecting different model concepts that has always seemed more like a concession to the storage medium than a desired step. If git existed on an already ACID compliant system instead of trying to build one out of the filesystem itself I don't see a reason to keep all the references as loose as they are. If you can cascade changes with confidence you can likely just switch to using standard surrogate keys for linkages and allow the data to normalize more fully.
  The core model objects in git are all pretty straightforward and their interactions well defined.