Is there a way to entity resolution / entity deduplication at scale with PostgreSQL and PGvector?
Arik Levy
20
Reputation points
I'm working on merging company data from several different providers. I'm exploring an entity resolution approach using separate embeddings for name, location, and domain, stored in vector indexes. I'm considering postges as a database option, however I'm not sure if this is even possible. I know you can do individual searches, e.g. I have a vector for name, give me the 10 closest names to it. But I want to ideally e.g. have 400 million firms, run a clustering algo, end up with 100 million resolved firms.
Is this something that can be done in postgresql?
Sign in to answer