Using Similarity Digests and Approximate Matching to Measure Knowledge Reuse and dissemination

Mahmood Shafeie Zargar, Coen van der Geest, Tomer Iwan

January, 2022

Abstract

In this paper we give an introduction and an overview of the textreuse method, highlight the challenges one faces in implementing the method, and showcase illustrative cases in which textreuse is able to determine the origin and inheritance of a textual artifact. Application of textreuse, and related methods often require a diverse knowledge base, understanding the topic at hand while having a thorough understanding of computational methods and programming. Given that not all researchers are familiar with such computational methods, we will not solely showcase the textreuse method but also propose a framework that will reduce the computational complexity of applying textreuse. As such, increase the utility and applicability of the textresuse method within the social sciences. The dataset consist of the Julia programming language code base, containing around 3000 user build applications.

Type

Manuscript

Publication

Working Paper

Using Similarity Digests and Approximate Matching to Measure Knowledge Reuse and dissemination

Abstract

Coen van der Geest

PhD Candidate in Information Systems