Jump to content

PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints Using Literature Databases

Fast facts

  • Further publishers

    • Louise Bloch
  • Publishment

    • 2023
  • Anthology

    PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints Using Literature Databases

  • Organizational unit

  • Subjects

    • Computer science in general
  • Publication format

    Conference paper

Quote

Bloch, Louise, Rückert, Johannes & Friedrich, Christoph M. 2023. PreprintResolver: Improving Citation Quality by Resolving Published Versions of ArXiv Preprints Using Literature Databases. Linking Theory and Practice of Digital Libraries, 47-61.

Content

The growing impact of preprint servers enables the rapid sharing of time-sensitive research. Likewise, it is becoming increasingly difficult to distinguish high-quality, peer-reviewed research from preprints. Although preprints are often later published in peer-reviewed journals, this information is often missing from preprint servers. To overcome this problem, the PreprintResolver was developed, which uses four literature databases (DBLP, SemanticScholar, OpenAlex, and CrossRef/CrossCite) to identify preprint-publication pairs for the arXiv preprint server. The target audience focuses on, but is not limited to inexperienced researchers and students, especially from the field of computer science. The tool is based on a fuzzy matching of author surnames, titles, and DOIs. Experiments were performed on a sample of 1,000 arXiv-preprints from the research field of computer science and without any publication information. With 77.94%, computer science is highly affected by missing publication information in arXiv. The results show that the PreprintResolver was able to resolve 603 out of 1,000 (60.3%) arXiv-preprints from the research field of computer science and without any publication information. All four literature databases contributed to the final result. In a manual validation, a random sample of 100 resolved preprints was checked. For all preprints, at least one result is plausible. For nine preprints, more than one result was identified, three of which are partially invalid. In conclusion the PreprintResolver is suitable for individual, manually reviewed requests, but less suitable for bulk requests. The PreprintResolver tool (https://preprintresolver.eu) and source code (https://gitlab.com/ippolis_wp3/preprint-resolver) is available online.

Notes and references

This site uses cookies to ensure the functionality of the website and to collect statistical data. You can object to the statistical collection via the data protection settings (opt-out).

Settings(Opens in a new tab)