Testing and static analysis can help root out bugs in programs, but not in data. This paper introduces data debugging, an approach that combines program analysis and statistical analysis to find potential data errors. Since it is impossible to know a priori whether data are erroneous or not, data debugging locates data that has an unusual impact on the computation. Such data is either very important, or wrong. Data debugging is especially useful in the context of data-intensive program- ming environments that intertwine data with programs in the form of queries or formulas. We present the first data debugging tool, CheckCell, an add-in for Microsoft Excel. CheckCell identifies cells that have an unusually high impact on the spreadsheet’s computations. We show that CheckCell is both analytically and empirically fast and effective. We show that it successfully finds injected typographical errors produced by a generative model trained with data entry from 100,000 Mechanical Turk tasks. CheckCell also automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.
Thu 23 OctDisplayed time zone: Tijuana, Baja California change
| 15:30 - 17:00 | |||
| 15:3022m Talk | CheckCell: Data Debugging for Spreadsheets OOPSLA Dan Barowy University of Massachusetts, Amherst, Dimitar Gochev University of Massachusetts, Amherst, Emery D. Berger University of Massachusetts, AmherstLink to publication | ||
| 15:5222m Talk | Finding Minimum Type Error Sources OOPSLA Zvonimir Pavlinovic New York University, Tim King New York University, Thomas Wies New York UniversityLink to publication File Attached | ||
| 16:1522m Talk | Flint: Fixing Linearizability Violations OOPSLA Peng Liu Purdue University, Omer Tripp IBM Thomas J. Watson Research Center, Xiangyu Zhang Purdue UniversityLink to publication | ||
| 16:3722m Talk | Statistical Debugging for Real-World Performance Problems OOPSLALink to publication | ||

