Ensuring that a program operates correctly is a difficult task in large, complex systems. Enshrining invariants – desired properties of correct execution – in code or comments can support maintainability and help sustain correctness. Tools that can automatically infer and recommend invariants can thus be very beneficial. However, current invariant-suggesting tools, such as Daikon, suffer from high rates of false positives, in part because they only leverage traced program values from available test cases, rather than directly exploiting knowledge of the source code per se. We propose a machine-learning approach to judging the validity of invariants, specifically of method pre- and post-conditions, based directly on a method’s source code. We introduce a new, scalable approach to creating labeled invariants: using programs with large test-suites, we generate Daikon invariants using traces from subsets of these test-suites, and then label these as valid/invalid by cross-validating them with held-out tests. This process induces a large set of labels that provide a form of noisy supervision, which is then used to train a deep neural model, based on gated graph neural networks. Our model learns to map the lexical, syntactic, and semantic structure of a given method’s body into a probability that a candidate pre- or post-condition on that method’s body is correct and is able to accurately label invariants based on the noisy signal, even in cross-project settings. Most importantly, it performs well on a hand-curated dataset of invariants.
Generative models for source code are an interesting structured prediction problem, requiring to reason about both hard syntactic and semantic constraints as well as about natural, likely programs. We present a novel model for this problem that uses a graph to represent the intermediate state of the generated output. The generative procedure interleaves grammar-driven expansion steps with graph augmentation and neural message passing steps. An experimental evaluation shows that our new model can generate semantically meaningful expressions, outperforming a range of strong baselines.
We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory. To account for the fact that typically there are multiple correct SQL queries with the same or very similar semantics, we draw inspiration from syntactic parsing techniques and propose to train our sequence-to-action models with non-deterministic oracles. We evaluate our models on the WikiSQL dataset and achieve an execution accuracy of 83.7% on the test set, a 2.1% absolute improvement over the model trained with traditional static oracles assuming a single correct target SQL query. When further combined with the execution-guided decoding strategy, our model sets a new state-of-the-art performance at an execution accuracy of 87.1%.
We consider the problem of neural semantic parsing, which translates natural language questions into executable SQL queries. We introduce a new mechanism, execution guidance, to leverage the semantics of SQL. It detects and excludes faulty programs during the decoding procedure by conditioning on the execution of partially generated program. The mechanism can be used with any autoregressive generative model, which we demonstrate on four state-of-the-art recurrent or template-based semantic parsing models. We demonstrate that execution guidance universally improves model performance on various text-to-SQL datasets with different scales and query complexity: WikiSQL, ATIS, and GeoQuery. As a result, we achieve new state-of-the-art execution accuracy of 83.8% on WikiSQL.
We present a neural semantic parser that translates natural language questions into executable SQL queries with two key ideas. First, we develop an encoder-decoder model, where the decoder uses a simple type system of SQL to constraint the output prediction, and propose a value-based loss when copying from input tokens. Second, we explore using the execution semantics of SQL to repair decoded programs that result in runtime error or return empty result. We propose two model-agnostics repair approaches, an ensemble model and a local program repair, and demonstrate their effectiveness over the original model. We evaluate our model on the WikiSQL dataset and show that our model achieves close to state-of-the-art results with lesser model complexity.
We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios.
To the best of our knowledge, prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters.
Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across 153 tasks over 75 large real datasets, we observe a median profiling time of only ∼0.7s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as Flash Fill.
Synthesizing user-intended programs from a small number of input-output examples is a challenging problem with several important applications like spreadsheet manipulation, data wrangling and code refactoring. Existing synthesis systems either completely rely on deductive logic techniques that are extensively hand-engineered or on purely statistical models that need massive amounts of data, and in general fail to provide real-time synthesis on challenging benchmarks. In this work, we propose Neural-Guided Deductive Search (NGDS), a hybrid synthesis technique that combines the best of both symbolic logic techniques and statistical models. Thus, it produces programs that satisfy the provided specifications by construction and generalize well on unseen examples, similar to data-driven systems. Our technique effectively utilizes the deductive search framework to reduce the learning problem of the neural component to a simple supervised learning setup. Further, this allows us to both train on sparingly available real-world data and still leverage powerful recurrent neural network encoders. We demonstrate the effectiveness of our method by evaluating on real-world customer scenarios by synthesizing accurate programs with up to 12× speed-up compared to state-of-the-art systems.
Program synthesis is the task of automatically finding a program in the underlying programming language that satisfies the user intent expressed in the form of some specification. Since the inception of AI in the 1950s, this problem has been considered the holy grail of Computer Science. Despite inherent challenges in the problem such as ambiguity of user intent and a typically enormous search space of programs, the field of program synthesis has developed many different techniques that enable program synthesis in different real-life application domains. It is now used successfully in software engineering, biological discovery, computer-aided education, end-user programming, and data cleaning. In the last decade, several applications of synthesis in the field of programming by examples have been deployed in mass-market industrial products.
This survey is a general overview of the state-of-the-art approaches to program synthesis, its applications, and subfields. We discuss the general principles common to all modern synthesis approaches such as syntactic bias, oracle-guided inductive search, and optimization techniques. We then present a literature review covering the four most common state-of-the-art techniques in program synthesis: enumerative search, constraint solving, stochastic search, and deduction-based programming by examples. We conclude with a brief list of future horizons for the field.
Programming by examples (PBE), or inductive program synthesis, is a problem of finding a program in the underlying domain-specific language (DSL) that is consistent with the given input-output examples or constraints. In the last decade, it has gained a lot of prominence thanks to the mass-market deployments of several PBE-based technologies for data wrangling – the widespread problem of transforming raw datasets into a structured form, more amenable to analysis. However, deployment of a mass-market application of program synthesis is challenging. First, an efficient implementation requires non-trivial engineering insight, often overlooked in a research prototype. This insight takes the form of domain-specific knowledge and heuristics, which complicate the implementation, extensibility, and maintenance of the underlying synthesis algorithm. Second, application development should be fast and agile, tailoring to versatile market requirements. Third, the underlying synthesis algorithm should be accessible to the engineers responsible for product maintenance.
In this work, I show how to generalize the ideas of 10+ previous specialized inductive synthesizers into a single framework, which facilitates automatic generation of a domain-specific synthesizer from the mere definition of the corresponding DSL and its properties. PROSE (PROgram Synthesis using Examples) is the first program synthesis framework that explicitly separates domain-agnostic search algorithms from domain-specific expert insight,making the resulting synthesizer both fast and accessible. The underlying synthesis algorithm pioneers the use of deductive reasoning for designer-defined domain-specific operators and languages, which enables mean synthesis times of 1-3 sec on real-life datasets.
A dedicated team at Microsoft has built and deployed 10+ technologies on top of the PROSE framework. Using them as case studies, I examine the user interaction challenges that arise after a mass-market deployment of a PBE-powered application. I show how expressing program synthesis as an interactive problem facilitates user intent disambiguation, incremental learning from additional examples, and increases the users’ confidence in the system.
Program synthesis from incomplete specifications (e.g. input-output examples) has gained popularity and found real-world applications, primarily due to its ease-of-use. Since this technology is often used in an interactive setting, efficiency and correctness are often the key user expectations from a system based on such technologies. Ensuring efficiency is challenging since the highly combinatorial nature of program synthesis algorithms does not fit in a 1-2 second response expectation of a user-facing system. Meeting correctness expectations is also difficult, given that the specifications provided are incomplete, and that the users of such systems are typically non-programmers.
In this paper, we describe how interactivity can be leveraged to develop efficient synthesis algorithms, as well as to decrease the cognitive burden that a user endures trying to ensure that the system produces the desired program. We build a formal model of user interaction along three dimensions: incremental algorithm, step-based problem formulation, and feedback-based intent refinement. We then illustrate the effectiveness of each of these forms of interactivity with respect to synthesis performance and correctness on a set of real-world case studies.
Automatic program transformation tools can be valuable for programmers to help them with refactoring tasks, and for Computer Science students in the form of tutoring systems that suggest repairs to programming assignments. However, manually creating catalogs of transformations is complex and time-consuming. In this paper, we present Refazer, a technique for automatically learning program transformations. Refazer builds on the observation that code edits performed by developers can be used as input-output examples for learning program transformations. Example edits may share the same structure but involve different variables and subexpressions, which must be generalized in a transformation at the right level of abstraction. To learn transformations, Refazer leverages state-of-the-art programming-by-example methodology using the following key components: (a) a novel domain-specific language (DSL) for describing program transformations, (b) domain-specific deductive algorithms for efficiently synthesizing transformations in the DSL, and (c) functions for ranking the synthesized transformations.
We instantiate and evaluate Refazer in two domains. First, given examples of code edits used by students to fix incorrect programming assignment submissions, we learn program transformations that can fix other students’ submissions with similar faults. In our evaluation conducted on 4 programming tasks performed by 720 students, our technique helped to fix incorrect submissions for 87% of the students. In the second domain, we use repetitive code edits applied by developers to the same project to synthesize a program transformation that applies these edits to other locations in the code. In our evaluation conducted on 56 scenarios of repetitive edits taken from three large C# open-source projects, Refazer learns the intended program transformation in 84% of the cases using only 2.9 examples on average.
Programming by Examples (PBE) has the potential to revolutionize end-user programming by enabling end users, most of whom are non-programmers, to create small scripts for automating repetitive tasks. However, examples, though easy to provide, are an ambiguous specification of the user’s intent. Because of that, a key impedance in adoption of PBE systems is the lack of user confidence in the correctness of the program that was synthesized by the system.
We present two novel user interaction models that communicate actionable information to the user to help resolve ambiguity in the examples. One of these models allows the user to effectively navigate between the huge set of programs that are consistent with the examples provided by the user. The other model uses active learning to ask directed example-based questions to the user on the test input data over which the user intends to run the synthesized program.
Our user studies show that each of these models significantly reduces the number of errors in the performed task without any difference in completion time. Moreover, both models are perceived as useful, and the proactive active-learning based model has a slightly higher preference regarding the users’ confidence in the result.
Inductive synthesis, or programming-by-examples (PBE) is gaining prominence with disruptive applications for automating repetitive tasks in end-user programming. However, designing, developing, and maintaining an effective industrial-quality inductive synthesizer is an intellectual and engineering challenge, requiring 1-2 man-years of effort.
Our novel observation is that many PBE algorithms are a natural fall-out of one generic meta-algorithm and the domain-specific properties of the operators in the underlying domain-specific language (DSL). The meta-algorithm propagates example-based constraints on an expression to its subexpressions by leveraging associated witness functions, which essentially capture the inverse semantics of the underlying operator. This observation enables a novel program synthesis methodology called data-driven domain-specific deduction (D4), where domain-specific insight, provided by the DSL designer, is separated from the synthesis algorithm.
Our FlashMeta framework implements this methodology, allowing synthesizer developers to generate an efficient synthesizer from the mere DSL definition (if properties of the DSL operators have been modeled). In our case studies, we found that 10+ existing industrial-quality mass-market applications based on PBE can be cast as instances of D4. Our evaluation includes reimplementation of some prior works, which in FlashMeta become more efficient, maintainable, and extensible. As a result, FlashMeta-based PBE tools are deployed in several industrial products, including Microsoft PowerShell 3.0 for Windows 10, Azure Operational Management Suite, and Microsoft Cortana digital assistant.
Word problems are an established technique for teaching mathematical modeling skills in K-12 education. However, many students find word problems unconnected to their lives, artificial, and uninteresting. Most students find them much more difficult than the corresponding symbolic representations. To account for this phenomenon, an ideal pedagogy might involve an individually crafted progression of unique word problems that form a personalized plot.
We propose a novel technique for automatic generation of personalized word problems. In our system, word problems are generated from general specifications using answer-set programming (ASP). The specifications include tutor requirements (properties of a mathematical model), and student requirements (personalization, characters, setting). Our system takes a logical encoding of the specification, synthesizes a word problem narrative and its mathematical model as a labeled logical plot graph, and realizes the problem in natural language. Human judges found our problems as solvable as the textbook problems, with a slightly more artificial language.
We show how to programmatically model processes that humans use when extracting answers to queries (e.g., “Who invented typewriter?”, “List of Washington national parks”) from semi-structured Web pages returned by a search engine. This modeling enables various applications including automating repetitive search tasks, and helping search engine developers design micro-segments of factoid questions.
We describe the design and implementation of a domain-specific language that enables extracting data from a webpage based on its structure, visual layout, and linguistic patterns. We also describe an algorithm to rank multiple answers extracted from multiple webpages.
On 100,000+ queries (across 7 micro-segments) obtained from Bing logs, our system LaSEWeb answered queries with an average recall of 71%. Also, the desired answer(s) were present in top-3 suggestions for 95%+ cases.