data quality

    Longneck Data Integration

    Longeck Data Integration goes open source and free!

    Entity Resolution with Heavy Indexing

    Entity resolution (ER), or deduplication is a computationally hard problem with O(n2 ) time complexity. We reformulate ER as a search problem, and develop algorithms using efficient indices. Indices can enhance algorithm scalability, facilitate distributed processing, but require additional storage space. We study the performance and tradeoffs between index update and search in ER algorithms, and show that significant performance gain can be obtained by using indices.