Indexing 100M vectors in 20 minutes on PostgreSQL with 12GB RAM

(blog.vectorchord.ai)

69 points | by gaocegege 5 days ago ago

16 comments

nwellinghoff 5 days ago ago

Too bad aws does not support any of these other vector extensions in managed rds.
ayende 4 hours ago ago

That suffer from a serious issue
You must have the data upfront, you cannot build this in an incremental fashion
There is also bo mention on how this would handle updates, and from the description, even if updates are possible, this will degrade over time, requiring new indexing batch
esafak 3 hours ago ago

How does it compare with paradedb and lancedb?
duckbot3000 5 days ago ago

Kinda makes you wonder why you need cloud for anything besides remote encrypted backups if you can run all that on 12GB

[-]
- riku_iki 4 days ago ago
  
  what about failover story if server dies? PG failover setup is complicated, and cloud infra handles this for you.
  
  [-]
  - logifail 3 hours ago ago
    
    (Genuine question) What's your current plan for when your cloud provider goes offline? Do you have a failover story, or it a case of "wait for them to come back online"?
    
    [-]
    - riku_iki 27 minutes ago ago
      
      I have backups on different cloud provider, so I could bootstrap db if provider goes dark indefinitely.
      But realistically, I believe major clouds (google, aws) likely has more robust org and infra for recovery than I can built and maintain.
  - tjwebbnorfolk 2 hours ago ago
    
    What are you willing to pay for cloud-native failover?
    Not every use case requires 100% uptime
    
    [-]
    - riku_iki 26 minutes ago ago
      
      Sure, but those who require (99% of major businesses) are ready to pay.
  - benjiro 2 hours ago ago
    
    https://github.com/multigres/multigres ... when its complete. From the guy that made Vitess for Mysql.
    And yes, i agree, the PG failover setup (and especially dealing with a failure afterwards, to restore the ex-master is beyond infuriating).
    But its not pay 10x the amount, while eating easily 10x performance infuriating :)
  - positron26 4 hours ago ago
    
    Do we mean managed or PG on K8s like CNPG? In all cases, I use the infra to simplify things like having disk redundancy and failover nodes, not because 12GB is interesting.
    
    [-]
    - riku_iki 4 hours ago ago
      
      Primary managed PG, since you still need setup/maintenance/monitoring on your K8S own solution.
- setr 5 days ago ago
  
  Because getting any hardware out of infra-team on premise is utterly miserable, across the board.
  
  [-]
  - lelanthran 4 hours ago ago
    
    That's not the only alternative.
    Rent your VPS and add in extra volumes for like $10 per 100GB.
    
    [-]
    - Imustaskforhelp 2 hours ago ago
      
      Funny thing but netcup has $10 per 1 TB
      Netcup is under-rated but there are also other providers too at lowendbox/lowendtalk and I am interested to try out hetzner too sometime.
      
      [-]
      - benjiro an hour ago ago
        
        And if you want to go even cheaper, check out Hetzner their EX63 (go to custom) > 4x 7.68TB drives for like 140 Euro.
        Not counting the fact that Netcup is raided (also Netcup is limited to 8TB on a VPS).
        That is like 4.7 Euro /TB. That is like 4$/TB. 6 Euro / TB in a raid 5 setup.
        I do not understand why they are not using this new pricing model on their older servers. There the best you can get is like 10 Euro /TB (for the single 15TB U.2).