Supporting program comprehension with source code summarization icse nier 2010

Supporting Program Comprehension with Source Code Summarization Sonia Haiduc*, Jairo Aponte**, Andrian Marcus* ICSE NIER 2010 * **

Developers read source code • Before performing maintenance on a system, developers need to understand its source code • During comprehension, programmers search and browse the code

Skimming vs. reading code • Skimming (Starke’09): quickly reading the names of software artifacts + Fast – Insufficient information – Shallow understanding • Reading in depth – Slow – Too much information + Deeper understanding

Code summaries • Automatically generated, short, yet accurate descriptions of source code entities • They give more information than just the header or the name of an artifact • Significantly shorter and faster to read than the source code they summarize

What should we summarize? • Code – Packages – Classes – Methods – Method sequences – Etc. • Other artifacts – Bug reports (ICSE 2010 - S. Rastakar, G. Murphy, G. Murray) – E-mails – Etc.

What should we include in code summaries? • Semantic information – What does the source code do? – Identifiers and comments that capture the main concepts • Structural information – How does the code work? – Class relationships, callers and callees, members of a class, etc.

Description: VFS virtual file system read write mkdir directory path save + Internal classes: DirectoryEntry + Methods: listDirectory, mkdir, constructPath + Fields: WRITE_CAP, READ_CAP, lock + Sub-classes: FileVFS, FavoritesVFS + Other: ...

How should we generate code summaries? • Semantic information: automatic text summarization – Machine Learning – Discourse-based approaches – Term-based Text Retrieval techniques • Structural information: static analysis

How can we evaluate code summaries? • How good are the automatic summaries when compared to manual ones? • How useful are the automatic code summaries for SE tasks?

Preliminary evaluation • Compared automatic code summaries with developer code summaries • 6 developers, 12 methods in ATunes • Used only lexical information – 5 most relevant terms

Results • Automatic source code summaries good in reflecting developers’ summaries • Text Retrieval techniques work as well on source code as on natural language in reflecting human summaries • Developers make use of structural information in their code summaries: – Method name terms – Class name terms – Formal parameter types terms

What are we doing now? • What type and how much structural information should be included in code summaries? • How do developers generate summaries? • Are different summaries needed for different tasks? • How useful are the code summaries for SE tasks?, etc.

In summary… • Automatic code summaries: – Short yet accurate descriptions of source code – Can reduce the effort of program comprehension – Embed both semantic and structural information – Can be generated for a variety of software entities • Visit my poster (HINT: look for the huge and colorful one) • www.cs.wayne.edu/~severe and www.cs.wayne.edu/~shaiduc • sonja@wayne.edu

Supporting program comprehension with source code summarization icse nier 2010

More Related Content

What's hot

Similar to Supporting program comprehension with source code summarization icse nier 2010

Recently uploaded

Supporting program comprehension with source code summarization icse nier 2010