This is a great practical application of pgvector! The HN corpus is perfect for semantic search because the discussions tend to be technical and well-structured.
I'm curious about the embedding model you chose - did you compare different options (OpenAI ada-002, Cohere, open-source models like all-MiniLM)? And how's the query performance with pgvector at scale?
One feature that would be valuable: filtering by time range or karma score. Sometimes you want recent discussions vs. classic threads with high engagement.
Hey, great project! You mention that you didn't want to use a vector database in this project. Any particular reason for this? Have you also thought about using a search engine like Elastic or OpenSearch?
This is a great practical application of pgvector! The HN corpus is perfect for semantic search because the discussions tend to be technical and well-structured.
I'm curious about the embedding model you chose - did you compare different options (OpenAI ada-002, Cohere, open-source models like all-MiniLM)? And how's the query performance with pgvector at scale?
One feature that would be valuable: filtering by time range or karma score. Sometimes you want recent discussions vs. classic threads with high engagement.
Hey, great project! You mention that you didn't want to use a vector database in this project. Any particular reason for this? Have you also thought about using a search engine like Elastic or OpenSearch?