Is there a way to entity resolution / entity deduplication at scale with PostgreSQL and PGvector?

Arik Levy 20 Reputation points
2025-01-14T10:16:20.3266667+00:00

I'm working on merging company data from several different providers. I'm exploring an entity resolution approach using separate embeddings for name, location, and domain, stored in vector indexes. I'm considering postges as a database option, however I'm not sure if this is even possible. I know you can do individual searches, e.g. I have a vector for name, give me the 10 closest names to it. But I want to ideally e.g. have 400 million firms, run a clustering algo, end up with 100 million resolved firms.

Is this something that can be done in postgresql?

Azure Database for PostgreSQL
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.