During the recent EIDR Annual Participant Meeting in Los Angeles, Hollie Choi, EIDR’s managing director, and Richard Kroon, EIDR’s technology director, shared insights into recent enhancements to the EIDR registry, listed the top five data quality issues EIDR is facing, and offered a look ay EIDR’s new search abilities.
During the presentation “Quality In, Quality Out: Inside EIDR’s Registry Enhancements,” Choi and Kroon covered key issues identified (including title language identification), the solutions implemented, and the ongoing efforts to strengthen the accuracy and reliability of the EIDR registry. The newly redesigned public search interface was unveiled, designed to make it easier to explore and access high-quality registry data.
Among the language challenges the registry has been facing is more than 540,000 titles currently have an undetermined language in the data; native script and transliterated titles cannot be identified as matching, and de-duplication is inaccurate when company names and credits are in a mix of different scripts. Using Google’s Language APIs, the new EIDR language tool detects unidentified title languages, identifies records where the detected language does not match the provided language, transliterates non-Latin titles and hanges it to Primary, and more.
Future improvements will include language detection upon submission and the utilization of additional APIs to increase coverage.
To listen to the EIDR presentation, click here. To view the presentation slide deck, click here.

