data integration

    Longneck Data Integration

    A Longeck Data Integration eszközünk ezentúl open source és ingyenes!

    Longneck a BI Fórum 2013 konferencián

    Sidló Csaba
    2013-11-06 16:00

    Adatminőség-javító eszköz tervezése a Longneck open source adatintegrációs, data quality eszközünkön keresztül.

    Entity Resolution with Heavy Indexing

    Entity resolution (ER), or deduplication is a computationally hard problem with O(n2 ) time complexity. We reformulate ER as a search problem, and develop algorithms using efficient indices. Indices can enhance algorithm scalability, facilitate distributed processing, but require additional storage space. We study the performance and tradeoffs between index update and search in ER algorithms, and show that significant performance gain can be obtained by using indices.