data integration

    Longneck Data Integration

    Longeck Data Integration goes open source and free!

    Longneck on the BI Forum 2013, Open Analytics Day

    Sidló Csaba
    11/06/2013 - 16:00

    How to implement a data quality tool, and how we did it with our tool called Longneck.

    Entity Resolution with Heavy Indexing

    Entity resolution (ER), or deduplication is a computationally hard problem with O(n2 ) time complexity. We reformulate ER as a search problem, and develop algorithms using efficient indices. Indices can enhance algorithm scalability, facilitate distributed processing, but require additional storage space. We study the performance and tradeoffs between index update and search in ER algorithms, and show that significant performance gain can be obtained by using indices.