When it was first pitched during the Data-X lecture, “The Jokester” had a simple premise – how do we ensure that comedians are delivering original material? With so many comedians, particularly stand-up comics, accused of plagiarism, we knew that we had a market to play with. As we started to gather our team members, we knew that our team had to have one united passion – a love for stand up comedy. This passion for comedy and originality combined with our interest in natural language processing led to the manifestation of our project. However, because the scope of this project was incredibly vast, we decided to narrow it down. We decided to source all of our potential “plagiarized” material from Netflix comedy specials. This is because Netflix had a huge repository of comedy specials, and collecting our data from a single source would standardize the process.

The Jokester can best be described as a “turnitin” for comedy. Our goal with the project was to have comics decipher whether their work has been done by a comic in the past but also to provide comics with inspiration by showing them previous jokes that have been done already. Through iteration, the project we ended up with was a system that detected plagiarism from a chunk of text the user submits (around one paragraph) compared to our repository of famous Netflix specials. This plagiarizer outputs an accuracy of how similar the user’s chunk of text is to the works in our database.

https://github.com/chanvarmacal/jokester-final