-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathmacrosystem.html
More file actions
46 lines (45 loc) · 49.7 KB
/
macrosystem.html
File metadata and controls
46 lines (45 loc) · 49.7 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
<!DOCTYPE html>
<html>
<head>
<title>L.B.Stanza</title>
<link type="text/css" rel="stylesheet" href="resources/mainstyle.css">
<link type="text/css" rel="stylesheet" href="resources/documentation.css">
</head>
<body>
<table class="wrap">
<tr><td colspan="3" class="banner">
<a href="index.html">Home</a><a href="stanzabyexample.html">Table of Contents</a><a href="chapter9.html">Previous Chapter</a><a href="chapter10.html">Next Chapter</a>
</td></tr>
<tr>
<td class="nav">
<h1>NAVIGATION</h1>
<h2><a href="#anchor339">Stanza's Macro System</a></h2><h3><a href="#anchor88">What Is a Macro?</a></h3><h3><a href="#anchor89">Defining and Using Your First Macro</a></h3><h4><a href="#anchor340">Creating the Macro File</a></h4><h4><a href="#anchor341">Extending the Compiler</a></h4><h4><a href="#anchor342">Using Your New Macro</a></h4><h3><a href="#anchor90">Exploring Further</a></h3><h4><a href="#anchor343">Programmatic Code Transformations</a></h4><h4><a href="#anchor344">Bugs in Macros</a></h4><h4><a href="#anchor345">Building an Optimized Compiler</a></h4><h3><a href="#anchor91">The DefSyntax System: A Small Experiment Framework</a></h3><h4><a href="#anchor346">A Small Syntax</a></h4><h4><a href="#anchor347">Parsing Using the Syntax</a></h4><h4><a href="#anchor348">Initial Experiments</a></h4><h3><a href="#anchor92">The Pattern Language</a></h3><h4><a href="#anchor349">Literals</a></h4><h4><a href="#anchor350">Wildcards</a></h4><h4><a href="#anchor351">Concatenation</a></h4><h4><a href="#anchor352">Lists</a></h4><h4><a href="#anchor353">Ellipsis</a></h4><h4><a href="#anchor354">Splice-Ellipsis</a></h4><h4><a href="#anchor355">Examples of Combining Patterns</a></h4><h4><a href="#anchor356">Escaping</a></h4><h4><a href="#anchor357">Understanding the Lexer Shorthands</a></h4><h4><a href="#anchor358">Patterns with Lexer Shorthands</a></h4><h3><a href="#anchor93">Productions and Rules</a></h3><h4><a href="#anchor359">What is a Rule?</a></h4><h4><a href="#anchor360">What is a Production?</a></h4><h4><a href="#anchor361">Defining Multiple Productions</a></h4><h4><a href="#anchor362">Order of Rule Matching</a></h4><h4><a href="#anchor363">Failure Rules</a></h4><h4><a href="#anchor364">Closest Info</a></h4><h4><a href="#anchor365">Referencing Productions in Patterns</a></h4><h4><a href="#anchor366">Binders in Patterns</a></h4><h4><a href="#anchor367">Advanced Binding Patterns</a></h4><h4><a href="#anchor368">Guard Predicates</a></h4><h4><a href="#anchor369">Importing Productions</a></h4><h3><a href="#anchor94">The Stanza Core Macros</a></h3>
</td>
<td class="main">
<h1 id="anchor339">Stanza's Macro System</h1><p>(This chapter is still a work in progress - May 31, 2022)</p><p>This chapter teaches you about Stanza's macro system: what macros are, how to write your own, and some examples of using them.</p><p>Stanza's macro system results from the combination of three separate concepts and subsystems:</p><ol><li>an s-expression-based programmatic code transformation system,
</li><li>an extensible grammar system, and
</li><li>a template-based code generation utility.
</li></ol><h2 id="anchor88">What Is a Macro?</h2><p>A macro is a syntactic shorthand for some longer piece of code. Stanza's core design relies heavily upon macros, and we have already seen many of them.</p><p>As a simple example, the following shows off the <code>while</code> macro that is included with Stanza's core library:</p><pre><code>var counter:Int = 0<br>while counter < 10 :<br> println("counter = %_" % [counter])<br> counter = counter + 1</code></pre><p>If you don't want to use the <code>while</code> loop, you can also type the following instead:</p><pre><code>var counter:Int = 0<br>defn* loop () :<br> if counter < 10 :<br> println("counter = %_" % [counter])<br> counter = counter + 1<br> loop()<br>loop()</code></pre><p>It would work just the same. It's just slightly more verbose.</p><p>Here is another example. This is the syntax that we typically use to call the <code>do</code> function.</p><pre><code>for i in 0 to 10 do :<br> println("This is inside the loop.")<br> println("i is equal to %_" % [i])</code></pre><p>But, similarly, you could also type the following instead:</p><pre><code>do(<br> fn (i) :<br> println("This is inside the loop.")<br> println("i is equal to %_" % [i])<br> 0 to 10)</code></pre><p>It looks slightly uglier, but it works just the same. </p><p>The key point is that macros are just a syntactic abbreviation. A user is always free to choose not to use a macro if they are willing to type out the longer form of the code instead.</p><h2 id="anchor89">Defining and Using Your First Macro</h2><p>When debugging, it is common to write code that looks like this:</p><pre><code>println("DEBUG: x = %~" % [x])</code></pre><p>It prints out the current value of a variable.</p><p>Let's write a macro that will allow us to type this instead:</p><pre><code>PROBE(x)</code></pre><p>and have it automatically expand into the code above.</p><h3 id="anchor340">Creating the Macro File</h3><p>Create a new file called <code>debugmacros.stanza</code> containing the following contents:</p><pre><code>defpackage debugmacros :<br> import core<br> import collections<br> import stz/core-macros<br><br>defsyntax mydebugmacros :<br> import exp4 from core<br><br> defrule exp4 = (PROBE(?myvariable)) :<br> val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])<br> val form = qquote(println(~ format-string % [~ myvariable]))<br> parse-syntax[core / #exp](form)</code></pre><p>This file defines a new <span style="font-style:italic;">syntax package</span> called <code>mydebugmacros</code>, which contains the definition of a new macro called <code>PROBE</code>.</p><h3 id="anchor341">Extending the Compiler</h3><p>In order to use the new macro definition, we need to first <span style="font-style:italic;">extend Stanza</span> with the new macro definitions.</p><p>Open your terminal and type in the following:</p><pre><code>stanza extend debugmacros.stanza -o myextendedstanza</code></pre><p>This will result in a new Stanza compiler called <code>myextendedstanza</code> that now supports the new syntax.</p><h3 id="anchor342">Using Your New Macro</h3><p>Create a new file called <code>trymacros.stanza</code> containing the following contents:</p><pre><code>#use-added-syntax(mydebugmacros)<br>defpackage trymacros :<br> import core<br> import collections<br><br>defn main () :<br> val x = 10<br> val y = "Hello world"<br> val z = x * 10<br> PROBE(x)<br> PROBE(y)<br> PROBE(z)<br><br>main()</code></pre><p>And use your new extended compiler to compile and run it:</p><pre><code>./myextendedstanza trymacros.stanza -o trymacros<br>./trymacros</code></pre><p>It should print out:</p><pre><code>DEBUG: x = 10<br>DEBUG: y = "Hello world"<br>DEBUG: z = 100</code></pre><p>That's a useful utility! Notice that this abbreviation is something that could only be written as a macro. It is not possible to write a function that behaves like <code>PROBE</code>.</p><h2 id="anchor90">Exploring Further</h2><p>Our <code>debugmacros.stanza</code> file introduces a number of new concepts: <code>defsyntax</code>, <code>defrule</code>, <code>parse-syntax</code>. Let's explore each in turn.</p><h3 id="anchor343">Programmatic Code Transformations</h3><p>The body of the <code>defrule</code> construct is allowed to contain arbitrary Stanza code.</p><p>Add the following prints to the code:</p><pre><code>defpackage debugmacros :<br> import core<br> import collections<br> import stz/core-macros<br><br>defsyntax mydebugmacros :<br> import exp4 from core<br><br> defrule exp4 = (PROBE(?myvariable)) :<br> println("Implementation of PROBE macro.")<br> println("myvariable = %~" % [myvariable])<br><br> val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])<br> println("format-string = %~" % [format-string])<br><br> val form = qquote(println(~ format-string % [~ myvariable]))<br> println("form = %~" % [form])<br><br> val result = parse-syntax[core / #exp](form)<br> println("result = %~" % [result])<br> println("\n")<br><br> result</code></pre><p>Now rebuild the extended compiler, and use it to compile our <code>trymacros.stanza</code> file again.</p><pre><code>./myextendedstanza trymacros.stanza -o trymacros</code></pre><p>You should see, <span style="font-style:italic;">during compilation</span>, the following messages being printed out:</p><pre><code>Implementation of PROBE macro.<br>myvariable = x<br>format-string = "DEBUG: x = %~"<br>form = (println (@do "DEBUG: x = %~" % (@tuple x)))<br>result = ($do println ($do modulo "DEBUG: x = %~" ($tuple x)))<br><br><br>Implementation of PROBE macro.<br>myvariable = y<br>format-string = "DEBUG: y = %~"<br>form = (println (@do "DEBUG: y = %~" % (@tuple y)))<br>result = ($do println ($do modulo "DEBUG: y = %~" ($tuple y)))<br><br><br>Implementation of PROBE macro.<br>myvariable = z<br>format-string = "DEBUG: z = %~"<br>form = (println (@do "DEBUG: z = %~" % (@tuple z)))<br>result = ($do println ($do modulo "DEBUG: z = %~" ($tuple z)))</code></pre><p>Notice that we haven't yet ran the <code>trymacros</code> executable yet. These messages are printed out during the <span style="font-style:italic;">compilation</span> of <code>trymacros.stanza</code>. Macros execute at compilation-time. </p><p>Let's focus on the messages printed out just for the expression <code>PROBE(x)</code>. </p><p>The line</p><pre><code>defrule exp4 = (PROBE(?myvariable))</code></pre><p>defines a new syntax rule for Stanza expressions. The <code>PROBE(?myvariable)</code> is the definition of the <span style="font-style:italic;">pattern</span>. In this case, our pattern matches any code that looks like <code>PROBE(...)</code>, where a single s-expression is allowed within the ellipsis. </p><p>The question mark in front of <code>?myvariable</code> indicates that it is a <span style="font-style:italic;">pattern variable</span>. Within the body of the <code>defrule</code>, <code>myvariable</code> will refer to whatever s-expression the user provided within <code>PROBE(...)</code>. For the usage <code>PROBE(x)</code>, <code>myvariable</code> will take on the symbol <code>x</code>. </p><p>This can be observed in the message:</p><pre><code>myvariable = x</code></pre><p>Next, we use some basic string manipulation to construct the format string. The message</p><pre><code>format-string = "DEBUG: x = %~"</code></pre><p>shows us the final constructed string.</p><p>Finally, we use the <code>qquote</code> utility to construct an s-expression containing the code that we want the macro to expand into. This results in the form:</p><pre><code>form = (println (@do "DEBUG: x = %~" % (@tuple x)))</code></pre><p>Recall that the <code>@do</code> and <code>@tuple</code> symbols are inserted by the lexer. If we write the above form using the same notation that the lexer uses, it becomes:</p><pre><code>println("DEBUG: x = %~" % [x])</code></pre><p>which is exactly the final code that we want the macro to expand into.</p><p>The final step is for satisfying the requirements of the Stanza macro system. Each Stanza macro must return the final code to execute in terms of fully-expanded <span style="font-style:italic;">core forms</span>. To do that we call <code>parse-syntax</code> to continue expanding any remaining macros in the code, and the fully-expanded form is then shown in the message:</p><pre><code>result = ($do println ($do modulo "DEBUG: x = %~" ($tuple x)))</code></pre><h3 id="anchor344">Bugs in Macros</h3><p>Our macro implementation actually contains some errors in its implementation. Let's see what happens when it crashes.</p><p>Try changing the <code>trymacros.stanza</code> file to the following:</p><pre><code>#use-added-syntax(mydebugmacros)<br>defpackage trymacros :<br> import core<br> import collections<br><br>defn main () :<br> val x = 10<br> val y = "Hello world"<br> val z = x * 10<br> PROBE((x + z))<br><br>main()</code></pre><p>And try compiling it again. </p><pre><code>./myextendedstanza trymacros.stanza -o trymacros</code></pre><p>Our system crashes with the following printout:</p><pre><code>Implementation of PROBE macro.<br>myvariable = (x + z)<br>FATAL ERROR: No appropriate branch for arguments of type (FullList).<br> in core/print-stack-trace<br> at core/core.stanza:329.14<br> in core/print-stack-trace<br> at core/core.stanza:335.2<br> in core/fatal<br> at core/core.stanza:382.2<br> ...</code></pre><p>This is caused by the call to:</p><pre><code>name(unwrap-token(myvariable))</code></pre> <p><code>name</code> is a function that can only be called on <code>Symbol</code> objects, but in this case <code>myvariable</code> is a <code>List</code>.</p><p>So be cautious. When a macro crashes, it causes the entire compiler to crash. </p><p>We can fix this by adding the following check:</p><pre><code>defrule exp4 = (PROBE(?myvariable)) :<br> println("Implementation of PROBE macro.")<br> println("myvariable = %~" % [myvariable])<br><br> ;Check that PROBE is called correctly.<br> if unwrap-token(myvariable) is-not Symbol :<br> throw(Exception("%_: Incorrect usage of PROBE(x). \<br> The argument to PROBE must be a symbol." % [<br> closest-info()]))<br><br> val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])<br> println("format-string = %~" % [format-string])<br><br> val form = qquote(println(~ format-string % [~ myvariable]))<br> println("form = %~" % [form])<br><br> val result = parse-syntax[core / #exp](form)<br> println("result = %~" % [result])<br> println("\n")<br><br> result</code></pre><p>With this additional guard, the system will print out the following. </p><pre><code>[WORK IN PROGRESS]</code></pre><h3 id="anchor345">Building an Optimized Compiler</h3><p>We have been using the following command to extend the compiler:</p><pre><code>stanza extend debugmacros.stanza -o myextendedstanza</code></pre><p>You might have noticed that the extended compiler runs a bit slower than you're used to. This is because the extended compiler is compiled without optimizations, while the standard Stanza compiler is compiled in optimized mode.</p><p>When you are confident in the implementation of your macros, you can compile an optimized version of the compiler using this command:</p><pre><code>stanza extend debugmacros.stanza -o myextendedstanza -optimize</code></pre><p>Be cautious of using this before your macros have been fully debugged though. Optimized mode removes many safety checks for detecting errors early, and an incorrect program may behave strangely.</p><h2 id="anchor91">The DefSyntax System: A Small Experiment Framework</h2><p>The <code>defsyntax</code> system is Stanza's built-in parsing mechanism for s-expressions. It is both the underlying system used by the macro system for extending the syntax of the language, and it can also be used as a standalone utility.</p><p>To allow us to explore the <code>defsyntax</code> system we will build a small framework that allows us to quickly try out different syntax definitions. </p><h3 id="anchor346">A Small Syntax</h3><p>Create a new directory and create a file called <code>myparser.stanza</code> containing:</p><pre><code>defpackage myparser :<br> import core<br> import collections<br><br>defsyntax my-experimental-language :<br><br> public defproduction sentence: String<br> <br> defrule sentence = (the quick red fox) :<br> "Sentence about foxes"<br> <br> defrule sentence = (the lazy brown dog) :<br> "Sentence about dogs"<br><br> defrule sentence = (the 3 "friendly" lions) :<br> "Sentence about lions"</code></pre><p>This file contains the definition of the <code>my-experimental-language</code> syntax. </p><h3 id="anchor347">Parsing Using the Syntax</h3><p>Now create another file called <code>test-myparser.stanza</code> containing:</p><pre><code>defpackage test-myparser :<br> import core<br> import collections<br> import reader<br> import myparser<br><br>defn main () :<br> val forms = read-file("test-input.txt")<br> println("PARSING:\n%_\n\n" % [forms])<br><br> try :<br> val parsed = parse-syntax[my-experimental-language / #sentence](forms)<br> println("RESULT:\n%_\n\n" % [parsed])<br> catch (e:Exception) :<br> println("Could not parse forms.")<br> println(e)<br><br>main()</code></pre><p>This file reads in the s-expressions contained within a text file and asks the parsing system to interpret the forms as a <code>sentence</code> as defined in the <code>my-experimental-language</code> syntax. </p><h3 id="anchor348">Initial Experiments</h3><p>Now create our input test file, <code>test-input.txt</code>, containing:</p><pre><code>the lazy brown dog</code></pre><p>And run our system like this:</p><pre><code>stanza run myparser.stanza test-myparser.stanza</code></pre><p>You should see the following printed out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>We can try out the other recognized sentences too:</p><p>If we fill <code>test-input.txt</code> with:</p><pre><code>the quick red fox</code></pre><p>then the program prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we fill <code>test-input.txt</code> with:</p><pre><code>the 3 "friendly" lions</code></pre><p>then the program prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we fill <code>test-input.txt</code> with an unrecognized sentence:</p><pre><code>the quick blue fox</code></pre><p>then the program prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><h2 id="anchor92">The Pattern Language</h2><p>Using the above framework, we can now learn about the different patterns that the <code>defrule</code> construct supports. </p><h3 id="anchor349">Literals</h3><p>Our patterns so far consists of "literals". These must match exactly to constitute a match. </p><p>The literal pattern:</p><pre><code>quick</code></pre><p>matches against the s-expression:</p><pre><code>quick</code></pre><p>The literal pattern:</p><pre><code>3</code></pre><p>matches against the s-expression:</p><pre><code>3</code></pre><p>The literal pattern:</p><pre><code>"friendly"</code></pre><p>matches against the s-expression:</p><pre><code>"friendly"</code></pre><p>It does <span style="font-style:italic;">not</span> match against the s-expression:</p><pre><code>friendly</code></pre><p>The literal pattern:</p><pre><code>3L</code></pre><p>matches against:</p><pre><code>3L</code></pre><p>It does <span style="font-style:italic;">not</span> match against the s-expression:</p><pre><code>3</code></pre><h3 id="anchor350">Wildcards</h3><p>The wildcard pattern <code>_</code> matches against any single s-expression.</p><p>The pattern:</p><pre><code>_</code></pre><p>matches against all of the following s-expressions:</p><pre><code>3<br>"friendly"<br>(a b c)<br>3L<br>Pumbaa</code></pre><p>Note that <code>(a b c)</code> is a single s-expression.</p><h3 id="anchor351">Concatenation</h3><p>Multiple patterns can be concatenated together to form a longer pattern.</p><p>The pattern:</p><pre><code>a b</code></pre><p>matches against the following s-expressions:</p><pre><code>a b</code></pre><p>The pattern:</p><pre><code>a _ _ x</code></pre><p>matches against all of the following s-expressions:</p><pre><code>a y z x<br>a 1 2 x<br>a (1 2 3) (1 2 3) x</code></pre><h3 id="anchor352">Lists</h3><p>The list pattern <code>(...)</code> matches against list s-expressions.</p><p>The pattern:</p><pre><code>()</code></pre><p>matches against the following s-expression:</p><pre><code>()</code></pre><p>The pattern:</p><pre><code>(a)</code></pre><p>matches against the following s-expression:</p><pre><code>(a)</code></pre><p>The pattern:</p><pre><code>(a 3 "x")</code></pre><p>matches against the following s-expression:</p><pre><code>(a 3 "x")</code></pre><p>The pattern:</p><pre><code>(a 3 (y))</code></pre><p>matches against the following s-expression:</p><pre><code>(a 3 (y))</code></pre><p>The pattern:</p><pre><code>(a _ (_ y))</code></pre><p>matches against all of the following s-expressions:</p><pre><code>(a b (x y))<br>(a 3 ("hello" y))<br>(a "world (x y))<br>(a (x y z) ((x y z) y))</code></pre><h3 id="anchor353">Ellipsis</h3><p>An ellipsis pattern matches zero or more occurrences of a single pattern.</p><p>The pattern:</p><pre><code>a ...</code></pre><p>matches against all the following s-expressions:</p><pre><code>a<br>a a<br>a a a a<br>a a a a a a a a a</code></pre><p>It even matches against the empty s-expression:</p><pre><code></code></pre><p>The pattern:</p><pre><code>3 ...</code></pre><p>matches against all of the following s-expressions:</p><pre><code>3<br>3 3<br>3 3 3 3<br>3 3 3 3 3 3 3 3 3</code></pre><p>The pattern:</p><pre><code>_ ...</code></pre><p>matches against all of the following s-expressions:</p><pre><code>a<br>a a a<br>x y z w<br>x y (1 2 3) (1 2 3)<br>() () () ()</code></pre><p>And, like any ellipsis pattern, it matches against the empty s-expression:</p><pre><code></code></pre><p>So effectively, the pattern <code>_ ...</code> can match against anything at all.</p><p>The pattern:</p><pre><code>(x _ z) ...</code></pre><p>matches against all of the following s-expressions:</p><pre><code>(x y z)<br>(x y z) (x w z) (x h z)<br>(x 0 z) (x 1 z) (x 2 z)<br>(x () z) (x (0) z) (x (0 0) z)</code></pre><h3 id="anchor354">Splice-Ellipsis</h3><p>A "splice ellipsis" pattern can only be used following a list pattern. It is similar to the normal ellipsis pattern in that it matches zero or more occurrences of the pattern, but it applies to all subpatterns with the list. </p><p>The pattern:</p><pre><code>(x y) @...</code></pre><p>matches against all the following s-expressions:</p><pre><code>x y<br>x y x y<br>x y x y x y x y</code></pre><p>It does <span style="font-style:italic;">not</span> match against the s-expressions:</p><pre><code>x y x</code></pre><p>It does <span style="font-style:italic;">not</span> match against the s-expression:</p><pre><code>(x y)</code></pre><p>The pattern:</p><pre><code>(x _ z) @...</code></pre><p>matches against all the following s-expressions:</p><pre><code>x 0 z<br>x 0 z x 1 z x 2 z x 3 z<br>x () z x (0) z x (0 0) z x (0 0 0) z</code></pre><h3 id="anchor355">Examples of Combining Patterns</h3><p>The list and ellipsis patterns are very powerful, and can be combined to form very expressive patterns. Here are some examples.</p><p>The pattern:</p><pre><code>(x y ...) @...</code></pre><p>matches against all of the following s-expressions:</p><pre><code>x<br>x x<br>x x x x<br>x y<br>x y y y<br>x y y x y y y x y y y y y</code></pre><p>The pattern:</p><pre><code>begin ((_ . _) @...) ... end</code></pre><p>matches against all of the following s-expressions:</p><pre><code>begin (x . int) end<br>begin (x . int y . string) end<br>begin (x . int y . string w . int) end<br>begin (x . int y . string w . int) (x . int) (x . string) end</code></pre><h3 id="anchor356">Escaping</h3><p>Most symbols that appear in a pattern, e.g. <code>myname</code>, <code>myconstruct</code>, <code>x</code>, <code>y</code>, <code>z</code>, etc. are interpreted as simple literal patterns. There are a small number of special symbols that have special meanings, such as <code>...</code> and <code>@...</code>.</p><p>So what is the pattern that would actually match against the s-expression: </p><pre><code>a ... @... b</code></pre><p>In this case, use the "escape" operator, code{~}, to specify that the next s-expression in a pattern should be interpreted as a simple literal. </p><p>The pattern:</p><pre><code>a ~ ... ~ @... b</code></pre><p>matches against the s-expressions:</p><pre><code>a ... @... b</code></pre><p>The pattern:</p><pre><code>a ~ ~ b</code></pre><p>matches against the s-expressions:</p><pre><code>a ~ b</code></pre><h3 id="anchor357">Understanding the Lexer Shorthands</h3><p>To make writing code convenient and increase readability, Stanza's lexer automatically provides a small set of abbreviations. These abbreviations are fixed and cannot be modified by the user:</p><pre><code>{x} is an abbreviation for (@afn x)<br><br>[x] is an abbreviation for (@tuple x)<br><br>f(x) is an abbreviation for f (@do x)<br><br>f{x} is an abbreviation for f (@do-afn x)<br><br>f[x] is an abbreviation for f (@get x)<br><br>f<x> is an abbreviation for f (@of x)<br><br>?x is an abbreviation for (@cap x)<br><br>`sexp is an abbreviation for (@quote sexp)<br><br>a b c : is an abbreviation for a b c : (d e f)<br> d e f</code></pre><p>Curly brackets (<code>{}</code>) expand to a list with the <code>@afn</code> symbol as its first item. Square braces (<code>[]</code>) expand to a list with the <code>@tuple</code> symbol as its first item. An s-expression followed immediately by an opening parenthesis (<code>(</code>) inserts the <code>@do</code> symbol as the first item in the following list. An s-expression followed immediately by an opening curly bracket (<code>{</code>) inserts the <code>@do-afn</code> symbol as the first item in the following list. An s-expression followed immediately by a square brace (<code>[</code>) inserts the <code>@get</code> symbol as the first item in the following list. An s-expression followed immediately by an opening angle bracket (<code><</code>) inserts the <code>@of</code> symbol as the first item in the following list. A question mark followed immediately by a symbol expands to a list with the <code>@cap</code> symbol as its first item. A backquote followed by an s-expression expands to a list with the <code>@quote</code> symbol as its first item. A line ending colon automatically wraps the next indented block in a list.</p><p>Commas:</p><pre><code>(x, y, z)</code></pre><p>are treated identically to whitespace, and is an abbreviation for:</p><pre><code>(x y z)</code></pre><p>These abbreviations need to be taken into consideration when writing patterns. </p><p>As an example, the pattern:</p><pre><code>(@tuple _ ...)</code></pre><p>matches against all of the following s-expressions:</p><pre><code>(@tuple)<br>(@tuple x)<br>(@tuple x y z z z)<br>[]<br>[x]<br>[x y z z z]<br>[x, y, z, z, z]</code></pre><p>The pattern:</p><pre><code>plus (@do _ _)</code></pre><p>matches against all of the following s-expressions:</p><pre><code>plus (@do x y)<br>plus (@do 1 2)<br>plus(x y)<br>plus(1 2)<br>plus(x, y)<br>plus(1, 2)</code></pre><p>The pattern:</p><pre><code>while x : (println)</code></pre><p>matches against the following s-expressions:</p><pre><code>while x : (println)</code></pre><p>and it matches against these s-expressions:</p><pre><code>while x :<br> println</code></pre><p>but it does <span style="font-style:italic;">not</span> match against these s-expressions:</p><pre><code>while x : println</code></pre><h3 id="anchor358">Patterns with Lexer Shorthands</h3><p>Note that the lexer shorthands apply identically to patterns as well. </p><p>Thus the pattern:</p><pre><code>[_ ...]</code></pre><p>is identical to the pattern:</p><pre><code>(@tuple _ ...)</code></pre><p>And the pattern:</p><pre><code>while x :<br> println</code></pre><p>is identical to the pattern:</p><pre><code>while x : (println)</code></pre><p>It is customary to use lexer shorthands in the pattern definitions to improve readability.</p><p>Thus the pattern:</p><pre><code>[_ ...]</code></pre><p>matches against the following s-expressions:</p><pre><code>[x, y, z, z, z]</code></pre><p>The pattern:</p><pre><code>plus(_, _)</code></pre><p>matches against all the following s-expressions:</p><pre><code>plus(x, y)<br>plus(1, 2)</code></pre><p>The pattern:</p><pre><code>while x :<br> println</code></pre><p>matches against the following s-expressions:</p><pre><code>while x :<br> println</code></pre><p>but it does <span style="font-style:italic;">not</span> match against:</p><pre><code>while x : println</code></pre><h2 id="anchor93">Productions and Rules</h2><p></p><h3 id="anchor359">What is a Rule?</h3><p>A <span style="font-style:italic;">rule</span> is a combination of a pattern and a block of code to execute if the pattern matches. A <span style="font-style:italic;">rule</span> is specified for a <span style="font-style:italic;">production</span>. Here is an example rule that we used in our experiment framework:</p><pre><code>defrule sentence = (the quick red fox) :<br> "Sentence about foxes"</code></pre><p>The above syntax specifies the following:</p><ol><li>This is a new rule for the <code>sentence</code> production.
</li><li>The pattern is <code>the quick red fox</code>. So this rule matches any s-expressions that match this pattern.
</li><li>If the s-expressions match, then the rule returns the string <code>"Sentence about foxes"</code>.
</li></ol><h3 id="anchor360">What is a Production?</h3><p>A <span style="font-style:italic;">production</span> is a named set of rules. Our experiment framework defined a single production called <code>sentence</code>:</p><pre><code>public defproduction sentence: String</code></pre><p>The above specifies:</p><ol><li>This is a new production called <code>sentence</code>.
</li><li>The rules for this production must return a <code>String</code> if they match.
</li><li>This production is <span style="font-style:italic;">public</span> and is visible to users of this syntax package.
</li></ol><p>Recall that to use our syntax package to parse some s-expressions we used the following:</p><pre><code>val parsed = parse-syntax[my-experimental-language / #sentence](forms)</code></pre><p>The above specifies:</p><ol><li>Parse the s-expressions contained in the variable <code>forms</code>.
</li><li>Parse using the rules associated with the <code>sentence</code> production in the <code>my-experimental-language</code> syntax package.
</li><li>On a successful match the rule that matched will execute and return a <code>String</code>. This string will be stored into the <code>parsed</code> variable.
</li></ol><h3 id="anchor361">Defining Multiple Productions</h3><p>A syntax package can contain as many productions as we like.</p><p>Let's introduce one more production to our experimental syntax package. Here is the new <code>myparser.stanza</code>:</p><pre><code>defpackage myparser :<br> import core<br> import collections<br><br>defsyntax my-experimental-language :<br><br> public defproduction sentence: String<br> <br> defrule sentence = (the quick red fox) :<br> "Sentence about foxes"<br> <br> defrule sentence = (the lazy brown dog) :<br> "Sentence about dogs"<br><br> defrule sentence = (the 3 "friendly" lions) :<br> "Sentence about lions"<br><br> public defproduction animal: String<br> defrule animal = (fox) : "little fox"<br> defrule animal = (dog) : "loyal dog"<br> defrule animal = (lion) : "regal lion"</code></pre><p>Let's now modify the test program to try parsing some s-expressions using both productions. It will first attempt to parse the contents as a <code>sentence</code>, and then attempt to parse the contents as a <code>animal</code>. </p><pre><code>defpackage test-myparser :<br> import core<br> import collections<br> import reader<br> import myparser<br><br>defn main () :<br> val forms = read-file("test-input.txt")<br> println("PARSING:\n%_\n\n" % [forms])<br><br> try :<br> val parsed = parse-syntax[my-experimental-language / #sentence](forms)<br> println("RESULT:\n%_\n\n" % [parsed])<br> catch (e:Exception) :<br> println("Could not parse forms as sentence.")<br> println(e)<br><br> try :<br> val parsed = parse-syntax[my-experimental-language / #animal](forms)<br> println("RESULT:\n%_\n\n" % [parsed])<br> catch (e:Exception) :<br> println("Could not parse forms as animal.")<br> println(e)<br><br>main()</code></pre><p>If we now fill <code>test-input.txt</code> with:</p><pre><code>lion</code></pre><p>and run the test program it will print out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Notice that the s-expressions could <span style="font-style:italic;">not</span> be parsed as a <code>sentence</code>, but it can be successfully parsed as an <code>animal</code>. </p><h3 id="anchor362">Order of Rule Matching</h3><p>When searching for a match, the rules for a production are tested one at a time until the system reaches the first rule that matches. </p><p>Here is an example of a production with multiple rules: </p><pre><code>public defproduction sentence: String<br><br>defrule sentence = (one big chance) :<br> "One big chance"<br><br>defrule sentence = (one _ chance) :<br> "One ??? chance"<br><br>defrule sentence = (_ big chance) :<br> "??? big chance"<br><br>defrule sentence = (_ _ _) :<br> "Default case"</code></pre><p>If we try to parse the following input:</p><pre><code>one big chance</code></pre><p>using the above production, the system will try out the first pattern <code>one big chance</code>. Since this pattern matches, the system will return <code>"One big chance"</code> and skip testing the rest of the rules.</p><p>If we try to parse:</p><pre><code>one small chance</code></pre><p>then the system will return:</p><pre><code>"One ??? chance"</code></pre><p>because the <code>one _ chance</code> pattern is the first pattern that matches.</p><p>If we try to parse:</p><pre><code>my big chance</code></pre><p>then the system will return:</p><pre><code>"??? big chance"</code></pre><p>Finally, if we try to parse:</p><pre><code>my big break</code></pre><p>then the system will return:</p><pre><code>"Default case"</code></pre><h3 id="anchor363">Failure Rules</h3><p>As mentioned above, by default the system automatically tries rules one at a time until it finds the first rule that matches.</p><p>Sometimes, to keep behaviour predictable, it is important to <span style="font-style:italic;">prevent</span> the system from continuing the search if we can determine early that something has gone wrong. To handle this case, we can use a <code>fail-if</code> rule. </p><p>Here is an example:</p><pre><code>public defproduction sentence: String<br><br>defrule sentence = (one big chance) :<br> "One big chance"<br><br>fail-if sentence = (one red chance) :<br> Exception("Sentence doesn't make sense. A chance cannot be red.")<br><br>defrule sentence = (one _ chance) :<br> "One ??? chance"<br><br>defrule sentence = (_ big chance) :<br> "??? big chance"<br><br>defrule sentence = (_ _ _) :<br> "Default case"</code></pre><p>Let's try to parse the following input:</p><pre><code>one red chance</code></pre><p>Our test program will print out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Note that this means that the input did not successfully parse. As soon as the system detects that the input matches the pattern <code>one red chance</code> it halts the entire parse. </p><p>The general form of a <code>fail-if</code> rule has this structure:</p><pre><code>fail-if production = (pattern) :<br> exception-body</code></pre><p>which says:</p><ol><li>If the system is parsing the production <code>production</code>,
</li><li>and the system detects that the input matches the pattern <code>pattern</code>,
</li><li>then the entire parse is a failure. The <code>exception-body</code> is executed to compute an <code>Exception</code> object that represents the cause of the failure.
</li></ol><p>This description is quite abstract, but we will use this construct later in a larger example that will show off the practical situations when <code>fail-if</code> rules are useful.</p><h3 id="anchor364">Closest Info</h3><p>Within both <code>defrule</code> and <code>fail-if</code> rules, a special function called <code>closest-info</code> can be used to retrieve the file name and line number where the rule first matched. It returns a <code>FileInfo</code> object if there is file information attached, or <code>false</code> otherwise. </p><p>It is most often used in a <code>fail-if</code> rule to provide the location of the error. </p><p>Let's alter the <code>fail-if</code> rule above to the following:</p><pre><code>fail-if sentence = (one red chance) :<br> match(closest-info()) :<br> (info:FileInfo) : Exception(to-string("%_: Sentence doesn't make sense. A chance cannot be red." % [info]))<br> (f:False) : Exception("Sentence doesn't make sense. A chance cannot be red.")</code></pre><p>Now if we parse the following input:</p><pre><code>one red chance</code></pre><p>Our test program will print out:</p><pre><code>[WORK IN PROGRESS]</code></pre><h3 id="anchor365">Referencing Productions in Patterns</h3><p>The true expressivity of productions are fully utilitized only when we refer to a production from within a pattern. To refer to a production, we put the pound character <code>'#'</code> before the production name.</p><p>Here is an example:</p><pre><code>public defproduction sentence: String<br><br>defrule sentence = (A #animal is an animal) :<br> "Sentence about animals"<br><br>defrule sentence = (I am a #animal) :<br> "Sentence about what I am"<br><br>defproduction animal: String<br>defrule animal = (dog) : "Dogs"<br>defrule animal = (lion) : "Lions"<br>defrule animal = (meerkat) : "Meerkats"<br>defrule animal = (warthog) : "Warthogs"</code></pre><p>The pattern:</p><pre><code>I am a #animal</code></pre><p>consists of three literals (<code>I</code>, <code>am</code>, and <code>a</code>) that must exactly, followed by one production <code>#animal</code> that matches only if one of the <code>animal</code> rules match. </p><p>If we try to parse the following input:</p><pre><code>A dog is an animal</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we try to parse the following input:</p><pre><code>A meerkat is an animal</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we try to parse the following input:</p><pre><code>I am a lion</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we try to parse the following input:</p><pre><code>I am a cat</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><h3 id="anchor366">Binders in Patterns</h3><p>The previous section showed that we can refer to other productions from within a pattern. But it is unsatisfying that parsing both of the following:</p><pre><code>A dog is an animal<br>A lion is an animal</code></pre><p>outputs the same parsing result:</p><pre><code>"Sentence about animals"</code></pre><p>How would we know which specific animal we're talking about?</p><p>To handle this case, we can use a <span style="font-style:italic;">binder</span> to store the intermediate result of parsing the <code>#animal</code>. </p><p>Make the following change to the definition of our rules:</p><pre><code>defrule sentence = (A ?a:#animal is an animal) :<br> to-string("Sentence about animals. Specifically about %_." % [a])<br><br>defrule sentence = (I am a ?a:#animal) :<br> val len = length(a)<br> val singular = a[0 to len - 1]<br> to-string("Sentence about what I am. I am a %_." % [singular])</code></pre><p>Now if we parse:</p><pre><code>A dog is an animal</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we parse:</p><pre><code>A meerkat is an animal</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we parse:</p><pre><code>I am a lion</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>If we parse:</p><pre><code>I am a cat</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Recall that the <code>animal</code> production was declared like this:</p><pre><code>defproduction animal: String</code></pre><p>This means that the result of parsing an <code>animal</code> is a <code>String</code>.</p><p>Using the following syntax within a rule:</p><pre><code>?a:#animal</code></pre><p>means that we would like to use the variable <code>a</code> to refer to the result of parsing the animal from within the body of the rule.</p><p>If the <code>#animal</code> matches against <code>meerkat</code>, then <code>a</code> will contain the string <code>"Meerkats"</code>. If the <code>#animal</code> matches against <code>lion</code>, then <code>a</code> will contain the string <code>"Lions"</code>.</p><p>Binders can be used to refer to the result of any pattern. An especially common and useful use case is for ellipsis patterns.</p><p>Let's try adding one more rule for the <code>sentence</code> production.</p><pre><code>defrule sentence = (The following are all animals : (?xs:#animal ...)) :<br> to-string("Sentence listing different animals: %," % [xs])</code></pre><p>Within the body of the rule, the <code>xs</code> variable will refer to the result of parsing the pattern <code>#animal ...</code>. Since the result of parsing <code>#animal</code> is a <code>String</code>, the result of parsing <code>#animal ...</code> will be <code>List<String></code>. </p><p>If we parse the following:</p><pre><code>The following are all animals :<br> dog<br> lion<br> meerkat<br> warthog</code></pre><p>the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>More generally, you can use a binder to bind to the result of any pattern. </p><pre><code>?a:PATTERN</code></pre><p>There is a quick shorthand that allows you to omit the pattern entirely:</p><pre><code>?a</code></pre><p>This is synonymous with binding to the wildcard pattern:</p><pre><code>?a:_</code></pre><h3 id="anchor367">Advanced Binding Patterns</h3><p>The system properly handles binders when they are nested within list, ellipsis, and splice-ellipsis patterns. Here is an example of a nested binder:</p><pre><code>defrule sentence = (Mean animals :<br> (The ?animals:#animal is mean) @...) :<br> to-string("%, are mean." % [animals])</code></pre><p>If we parse the following:</p><pre><code>Mean animals :<br> The dog is mean<br> The lion is mean<br> The warthog is mean</code></pre><p>then the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Here is a sophisticated example involving multiple nested binders:</p><pre><code>defrule sentence = (Friendships:<br> (The ?animals:#animal likes (?friend-lists:#animal ...)) @...) :<br> val buffer = StringBuffer()<br> println(buffer, "Friendships between animals: ")<br> for (animal in animals, friends in friend-lists) do :<br> println(buffer, " %_ likes (%,)." % [animal, friends])<br> to-string(buffer)</code></pre><p>If we parse the following:</p><pre><code>Friendships :<br> The dog likes (dog, lion)<br> The meerkat likes (warthog)<br> The warthog likes (warthog, lion)</code></pre><p>then the system prints out:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Note that the variable <code>friend-lists</code> has type <code>List<List<String>></code> within the body of the rule. The pattern <code>#animal ...</code> has result type <code>List<String></code>, and when it is nested within the spliced-ellipsis pattern <code>@...</code>, the final result type becomes <code>List<List<String>></code>.</p><p>Binders can appear under arbitrary levels of nesting, but we recommend keeping it to below two levels in order to keep the code readable.</p><h3 id="anchor368">Guard Predicates</h3><p>A <span style="font-style:italic;">guard predicate</span> allows the user to use an arbitrary Stanza function to place further conditions on whether a rule matches or not. </p><p>Suppose that we have a predicate that determines whether a given s-expression might be a string that represents a name. We define a name to be any string that contains exactly a single space, and is made up of letters otherwise.</p><p><code><br>defn name? (x) -> True|False :<br> match(unwrap-token(x)) :<br> (s:String) :<br> val num-spaces = count({_ == ' '</code>, s)</p><p> val num-letters = count(letter?, s)</p><p> num-spaces == 1 and</p><p> num-letters + 1 == length(s)</p><p> (x) :</p><p> false</p><p>}</p><p>Suppose we have another predicate that determines whether a given s-expression might be a string that represents an address. We define an address to be any string that contains at least one space, one letter, and one digit.</p><p><code><br>defn address? (x) -> True|False :<br> match(unwrap-token(x)) :<br> (s:String) :<br> val num-spaces = count({_ == ' '</code>, s)</p><p> val num-digits = count(digit?, s)</p><p> val num-letters = count(letter?, s)</p><p> num-spaces > 0 and</p><p> num-digits > 0 and</p><p> num-letters > 0 and</p><p> (num-spaces + num-digits + num-letters) == length(s)</p><p> (x) :</p><p> false</p><p>}</p><p>Now we can use these predicates in the following rules.</p><pre><code>defrule sentence = (Detail for ?a:#animal: ?detail) when name?(detail) :<br> to-string("The name of %_ is %_." % [a, detail])<br><br>defrule sentence = (Detail for ?a:#animal: ?detail) when address?(detail) :<br> to-string("The %_ lives at address %_." % [a, detail])<br><br>fail-if sentence = (Detail for ?a:#animal: ?detail) :<br> Exception("Unsupported detail for %_." % [a])</code></pre><p>Parsing the following:</p><pre><code>Detail for dog: "134 Varsity Avenue"</code></pre><p>results in the following:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Parsing the following:</p><pre><code>Detail for dog: "Rummy Li"</code></pre><p>results in the following:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>Parsing the following:</p><pre><code>Detail for dog: "105; DROP TABLE Animals"</code></pre><p>results in the following:</p><pre><code>[WORK IN PROGRESS]</code></pre><p>In the above rules the guard predicates place additional conditions on whether a rule matches. The first rule is match only if <code>detail</code> is bound to an s-expression that passes our <code>name?</code> predicate. The second rule matches only if <code>detail</code> is bound to an s-expression that passes our <code>address?</code> predicate. Finally, the last <code>fail-if</code> rule specifies explicitly that it is an error if <code>detail</code> does not pass either predicate.</p><h3 id="anchor369">Importing Productions</h3><pre><code>[WORK IN PROGRESS]</code></pre><h2 id="anchor94">The Stanza Core Macros</h2><pre><code>[WORK IN PROGRESS]</code></pre>
</td>
<td class="rest">
<img url="resources/spacer.gif"></img>
</td>
</tr>
<tr><td colspan="3" class="footer">
Site design by Luca Li. Copyright 2015.
</td></tr>
</table>
</body>
</html>