Testing and static analysis can help root out bugs in programs, but not in data. This paper introduces data debugging, an approach that combines program analysis and statistical analysis to find potential data errors. Since it is impossible to know a priori whether data are erroneous or not, data debugging locates data that has an unusual impact on the computation. Such data is either very important, or wrong. Data debugging is especially useful in the context of data-intensive program- ming environments that intertwine data with programs in the form of queries or formulas. We present the first data debugging tool, CheckCell, an add-in for Microsoft Excel. CheckCell identifies cells that have an unusually high impact on the spreadsheet’s computations. We show that CheckCell is both analytically and empirically fast and effective. We show that it successfully finds injected typographical errors produced by a generative model trained with data entry from 100,000 Mechanical Turk tasks. CheckCell also automatically identifies a key flaw in the infamous Reinhart and Rogoff spreadsheet.
Thu 23 OctDisplayed time zone: Tijuana, Baja California change
15:30 - 17:00 | |||
15:30 22mTalk | CheckCell: Data Debugging for Spreadsheets OOPSLA Dan Barowy University of Massachusetts, Amherst, Dimitar Gochev University of Massachusetts, Amherst, Emery D. Berger University of Massachusetts, Amherst Link to publication | ||
15:52 22mTalk | Finding Minimum Type Error Sources OOPSLA Zvonimir Pavlinovic New York University, Tim King New York University, Thomas Wies New York University Link to publication File Attached | ||
16:15 22mTalk | Flint: Fixing Linearizability Violations OOPSLA Peng Liu Purdue University, Omer Tripp IBM Thomas J. Watson Research Center, Xiangyu Zhang Purdue University Link to publication | ||
16:37 22mTalk | Statistical Debugging for Real-World Performance Problems OOPSLA Link to publication |