The vast amount of code available on the web is increasing on a daily basis. Open-source hosting sites such as GitHub contain billions of lines of code. Community question-answering sites provide millions of code snippets with corresponding text and metadata. The amount of code available in executable binaries is even greater. In this talk, I will cover recent research trends on leveraging such “big code” for program analysis, program synthesis and reverse engineering. Along the way, we will consider a range of semantic program representations based on symbolic automata, tracelets and numerical abstractions as well as different notions of code similarity based on these representations. Finally, I will show applications of these techniques including semantic code search in both source code and stripped binaries, code completion and reverse engineering.
Track: ECOOP Summer School
Eran Yahav is an associate professor at the Computer Science Department, Technion, Israel. Prior to that, he was a research staff member at the IBM T.J. Watson Research Center in New York (2004-2010). He received his Ph.D. from Tel Aviv University (2005) and his B.Sc. from the Technion in 1996. His research interests include program analysis, program synthesis and program verification. Eran is a recipient of the prestigious Alon Fellowship for Outstanding Young Researchers, the Andre Deloro Career Advancement Chair in Engineering, the ERC Consolidator Grant as well as multiple best papers at various conferences.