Source code plagiarism detection for PHP language

  • Richard Všianský Mendel University in Brno
  • Dita Dlabolová Mendel University in Brno
  • Tomáš Foltýnek Mendel University in Brno
Keywords: source-code plagiarism, anti-plagiarism system, PHP, Anton

Abstract

This paper introduces a system for detection of plagiarism in source codes written in the PHP computer language, part of the plagiarism detection tool Anton. We used the greedy string tiling algorithm together with tokenization and hash calculation. The efficiency of the system was tested on both an artificial dataset and on real data coming from a course taught at our university. Our results are compared with other similar systems and solutions, concluding that Anton can detect all examined types of plagiarism with higher accuracy than other systems.

References

AIKEN, A., SCHLEIMER, S., WILKERSON, D.S., 2003. Winnowing: Local Algorithms for Document Fingerprinting. Stanford, Stanford University.

ARWIN, C., TAHAGHOGHI, S.M.M.: Plagiarism Detection across Programming Languages. In: Proceedings of the 29th Australasian Computer Science Conference, vol. 48, pp. 277–286 (2006)

BRETAG, T. Handbook of Academic Integrity. USA: Springer, 2015. ISBN 978-981-287-097-1

CLOUGH, P., 2000. Plagiarism in natural and programming languages: an overview of current tools and technologies. Sheffield: Department of Computer Science, University of Sheffield [Online]. Available from: http://ir.shef.ac.uk/cloughie/papers/plagiarism2000.pdf [Accessed 2017, October 31]

FLORES, E., BARRON-CEDENO, A., ROSSO, P., MORENO, L., 2011 Towards the detection of cross-language source code reuse. Proceedings of 16th International Conference on Applications of Natural Language to Information Systems, NLDB2011. Springer. ISBN 978-3-642-22326-6

FLORYČEK, J., 2015. Optimalizace antiplagiátorského řešení na Mendelově univerzitě v Brně. Brno: Mendelova univerzita v Brně [Online]. Available from: http://theses.cz/id/vgizl0/zaverecnaprace.pdf [Accessed 2017, October 31].

FOLTÝNEK, T., PROCHÁZKA. T., RYBIČKA, J., 2009. Plagiarism Detection System at Mendel University in Brno, Czech Republic. [DVD-ROM]. In IVKI 2009. Inovácia výskumu katedier informatiky. p. 50-53. ISBN 978-80-8094-579-4.

HEON, M., MURVIHILL, D. 2015. Program Similarity Detection with Checksims: A Major Qualifying Project Report [Online]. Available from https://web.wpi.edu/Pubs/E-project/Available/E-project-043015-122310/unrestricted/CheckSims.pdf [Accessed 2017, October 25]

JAMIESON, S. 2015. Is It Plagiarism or Patchwriting? Toward a Nuanced Definition. In Bretag, T. (Ed): Handbook of Academic Integrity. USA: Springer. ISBN 978-981-287-097-1

JPLAG, 2017. JPlag - Detecting Software Plagiarism [online]. Karlsruhe: Institute for Program Structures and Data Organization. Available from: https://jplag.ipd.kit.edu/ [Accessed 2017, April 5].

JOY, M., COSMA, G., YAU, J.Y., SINCLAIR, J. 2011. Source Code Plagiarism - A Student Perspective. IEEE Transactions on Education [Online]. 2011, 54(1), 125-132 DOI: 10.1109/TE.2010.2046664. ISSN 0018-9359. Available from: http://ieeexplore.ieee.org/document/5451097/ [Accessed 2017, October 25].

JOY, M., LUCK, M., 1999. Plagiarism in Programming Assignments. IEEE Transactions on Education, 42(2). pp. 129-133. ISSN 0018-9359. [Online] Available from: https://pdfs.semanticscholar.org/f161/83ebb570fe9d485a5d36f415e94215cf9ad3.pdf [Accessed 2017-10-27]

JOY, M. 2014. Sherlock - Plagiarism Detection Software. In: University of Warwick [online]. Available from http://www2.warwick.ac.uk/fac/sci/dcs/research/ias/software/sherlock/ [Accessed 2017, October 27]

KRPEC, O. 2015. Plagiarism recognizer in PHP source code. Excel@FIT 2015 conference proceedings. [Online] Available from http://excel.fit.vutbr.cz/submissions/2015/076/76.pdf [Accessed 2017, October 26]

LANCASTER, T., CULWIN, F. 2004. A Comparison of Source Code Plagiarism Detection Engines. Computer Science Education [online]. 2004, 14(2), 101-112. DOI: 10.1080/08993400412331363843. ISSN 0899-3408. Available from: http://www.tandfonline.com/doi/abs/10.1080/08993400412331363843 [Accessed 2017, October 25]

LAUER, H.C. 2015. Extensions and Enhancements for Checksims. In: Computer Science WPI [Online]. Available from: http://web.cs.wpi.edu/~lauer/MQP/Checksims_MQP_topics.htm [Accessed 2017, October 25]

MIRZA, O., JOY, M. 2015. Style Analysis For Source Code Plagiarism Detection. In: Plagiarism across Europe and Beyond: Conference Proceedings. Brno: MENDELU Publishing Centre, p. 53–61. ISBN 978-80-7509-267-0.

MOSS, 2017. A System for Detecting Software Similarity [online]. Available from: http://theory.stanford.edu/~aiken/moss [Accessed 2017, April 30].

MOUSSIADES, L., VAKALI, A. 2005. PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets. The Computer Journal [online]. 2005, 48(6), 651-661 [cit. 2017-10-27]. DOI: 10.1093/comjnl/bxh119. ISSN 1460-2067. Available from: http://academic.oup.com/comjnl/article/48/6/651/358280/PDetect-A-Clustering-Approach-for-Detecting [Accessed 2017, October 27]

MOZGOVOY, M., FREDRIKSSON, K., WHITE, D., JOY, M., SUTINEN, E. 2005. Fast Plagiarism Detection System. In CONSENS, M., NAVARRO, G. (ed.) String Processing and Information Retrieval [Online]. Berlin, Heidelberg: Springer, 2005, pp. 267-270. Lecture Notes in Computer Science. DOI: 10.1007/11575832_30. ISBN 978-3-540-29740-6. Available from: http://link.springer.com/10.1007/11575832_30 [Accessed 2017, October 27]

MURAO, H., OHNO, A., 2011. A two-step in-class source code plagiarism detection method utilizing improved cm algorithm and sim. International Journal of Innovative Computing, Information and Control. ICIC International, 7(8), 4729–4739. ISSN 1349-4198. Available from: http://www.ijicic.org/ijicic-10-05012.pdf [Accessed 2017, October 31]

PARKER, A., HAMBLEN, J.O. 1989. Computer algorithms for plagiarism detection. IEEE Transactions on Education [online]. 32(2), 94-99 [cit. 2017-10-25]. DOI: 10.1109/13.28038. ISSN 00189359. [Online] Available from: http://ieeexplore.ieee.org/document/28038/ [Accessed 2017, October 31]

PRECHELT, L., MALPOHL, G., PHILIPPSEN, M., 2000. JPlag: Finding plagiarisms among a set of programs. Karlsruhe: Fakultat fur Informatik Universit at Karlsruhe. [Online] Available from: http://page.mi.fu-berlin.de/prechelt/Biblio/jplagTR.pdf. [Accessed 2017, October 31]

SCHLEIMER, S., WILKERSON D.S., AIKEN, A. 2003. Winnowing. In: Proceedings of the 2003 ACM SIGMOD international conference on Management of data - SIGMOD '03 New York, New York, USA: ACM Press, 2003, s. 76. DOI: 10.1145/872757.872770. ISBN 158113634X. [Online]. Available from: http://portal.acm.org/citation.cfm?doid=872757.872770 [Accessed 2017, October 25]

SHAO, Z., 2015. Compilers and interpreters. New Haven: Yale University. [Online] Available from: http://flint.cs.yale.edu/cs421/lectureNotes/c02.pdf. [Accessed 2016, November 17]

SHERLOCK, n.d. The Sherlock Plagiarism Detector [Online] Available from z: http://www.cs.usyd.edu.au/~scilect/sherlock/ [Accessed 2017, October 27]

THE PHP GROUP, 2017. PHP - tokenizer [Online]. Available from: http://php.net/manual/en/book.tokenizer.php [Accessed 2017, May 14]

VŠIANSKÝ, Richard. 2017. Rozpoznávání podobností zdrojových kódů v systému Anton. Brno: Mendelova univerzita v Brně.

VŠIANSKÝ, R., DLABOLOVÁ, D., 2016. Deployment and improvements of system Anton. PEFnet 2016. Brno: Mendelova univerzita v Brně.
Published
2017-12-31
Section
Articles