Cross-Language Code Search using Static and Dynamic Analyses
Wed 25 Aug 2021 20:00 - 20:10 - Analytics & Software Evolution—Code Recommendation Chair(s): Davide Di Ruscio, Saikat Chakraborty
As code search permeates most activities in software development,code-to-code search has emerged to support using code as a query and retrieving similar code in the search results. Applications include duplicate code detection for refactoring, patch identification for program repair, and language translation. Existing code-to-code search tools rely on static similarity approaches such as the comparison of tokens and abstract syntax trees (AST) to approximate dynamic behavior, leading to low precision. Most tools do not support cross-language code-to-code search, and those that do, rely on machine learning models that require labeled training data.
We present Code-to-Code Search Across Languages (COSAL), a cross-language technique that uses both static and dynamic analyses to identify similar code and does not require a machine learning model. Code snippets are ranked using non-dominated sorting based on code token similarity, structural similarity, and behavioral similarity. We empirically evaluate COSAL on two datasets of 43,146Java and Python files and 55,499 Java files and find that 1) code search based on non-dominated ranking of static and dynamic similarity measures is more effective compared to single or weighted measures; and 2) COSAL has better precision and recall compared to state-of-the-art within-language and cross-language code-to-code search tools. We explore the potential for using COSAL on large open-source repositories and discuss scalability to more languages and similarity metrics, providing a gateway for practical,multi-language code-to-code search.
Wed 25 AugDisplayed time zone: Athens change
08:00 - 09:00 | Analytics & Software Evolution—Code RecommendationJournal First / Research Papers +12h Chair(s): Davide Di Ruscio University of L'Aquila, Saikat Chakraborty Columbia University | ||
08:00 10mPaper | Cross-Language Code Search using Static and Dynamic Analyses Research Papers DOI | ||
08:10 10mPaper | Automating the Removal of Obsolete TODO Comments Research Papers Zhipeng Gao Monash University, Xin Xia Huawei Technologies, David Lo Singapore Management University, John Grundy Monash University, Thomas Zimmermann Microsoft Research DOI | ||
08:20 10mPaper | Generating Question Titles for Stack Overflow from Mined Code Snippets Journal First Zhipeng Gao Monash University, Xin Xia Huawei Technologies, John Grundy Monash University, David Lo Singapore Management University, Yuan-Fang Li Monash University | ||
08:30 30mLive Q&A | Q&A (Analytics & Software Evolution—Code Recommendation) Research Papers |
20:00 - 21:00 | Analytics & Software Evolution—Code RecommendationResearch Papers / Journal First Chair(s): Davide Di Ruscio University of L'Aquila, Saikat Chakraborty Columbia University | ||
20:00 10mPaper | Cross-Language Code Search using Static and Dynamic Analyses Research Papers DOI | ||
20:10 10mPaper | Automating the Removal of Obsolete TODO Comments Research Papers Zhipeng Gao Monash University, Xin Xia Huawei Technologies, David Lo Singapore Management University, John Grundy Monash University, Thomas Zimmermann Microsoft Research DOI | ||
20:20 10mPaper | Generating Question Titles for Stack Overflow from Mined Code Snippets Journal First Zhipeng Gao Monash University, Xin Xia Huawei Technologies, John Grundy Monash University, David Lo Singapore Management University, Yuan-Fang Li Monash University | ||
20:30 30mLive Q&A | Q&A (Analytics & Software Evolution—Code Recommendation) Research Papers |