Using Similarity Digests and Approximate Matching to Measure Knowledge Reuse and dissemination

Abstract

In this paper we give an introduction and an overview of the textreuse method, highlight the challenges one faces in implementing the method, and showcase illustrative cases in which textreuse is able to determine the origin and inheritance of a textual artifact. Application of textreuse, and related methods often require a diverse knowledge base, understanding the topic at hand while having a thorough understanding of computational methods and programming. Given that not all researchers are familiar with such computational methods, we will not solely showcase the textreuse method but also propose a framework that will reduce the computational complexity of applying textreuse. As such, increase the utility and applicability of the textresuse method within the social sciences. The dataset consist of the Julia programming language code base, containing around 3000 user build applications.

Publication
Working Paper
Coen van der Geest
Coen van der Geest
PhD Candidate in Information Systems

My research interests include digital infrastructures, IT (Platform) Architectures and Application Programming Interfaces (APIs).