-
Notifications
You must be signed in to change notification settings - Fork 2
Batches Overview
Batches are a new technique to efficiently invoke remote services, including databases, web services, and distributed objects. Traditionally the interfaces of remote services must be carefully designed to avoid latency and multiple round trips, which arise naturally when fine-grained object-oriented interfaces are used remotely. Common techniques include use of Data Transfer Objects to transfer data in bulk, Server Facades that combine multiple operations to avoid round trips, and query languages to transfer operations from client to servers. Web services, SQL query/resultsets, and RESTful services can be viewed as combinations of these techniques. Batches automate all of these patterns, so that clients and servers can use logical, fine-grained object interfaces while relying on the system to automatically create data transfer objects, server facades, and queries on the fly.
The primary innovation that makes all this possible is a new control flow statement, called a "batch block". To illustrate batch blocks, lets consider what would be involved in taking a local operation and making it remote. As an example, consider a local function that cleans up a directory using the standard java.io.File interface:
void deleteFiles(File dir) {
System.out.println("Scanning " + dir.getName());
for (File file : dir.listFiles())
if (file.length() > 1000) {
System.out.println("Deleting " + file.getName());
file.delete();
}
}
The traditional way of thinking about remote objects, as in RMI or CORBA, is that dir could be a proxy for a directory on a remote server. With RMI, this would not work because File does not implement the java.rmi.Remote interface. More significantly, this code would create many round trips. If there are 100 files on the server, there would be at least 101 round trips, one to get the list of files, and one for each call to file.length(). There will be additional round trips for each file to be deleted. Asynchronous messages sends are often proposed as a solution to latency, but it they don't help much in this case, because the client must wait for the result of the length function before proceeding. But a more subtle problem is that the client doesn't care about files that will not be deleted, yet a proxy must be created for each file on the server, even if it will not be deleted.
All these problems are solved in the following code, which uses batches to optimize the communication.
void deleteFiles(Remote<File> dirServer) {
batch (File dir : dirServer) {
System.out.println("Scanning " + dir.getName());
for (File file : dir.listFiles())
if (file.length() > 1000) {
System.out.println("Deleting " + file.getPath());
file.delete();
}
}
}
A Remote client wraps an object and provides efficient access to the object and its sub-objects, assuming that the objects are remote. The batch statement binds to the wrapped object and ensures that all operations on the remote object are performed in one round trip. In the above example the batch statement guarantees that one round trip will be performed to list the files, identify the large files, get their names, and delete them. Doing so requires that multiple operations are combined together, results are returned in bulk, and that the file length criteria be executed on the server. However, the entire batch block cannot be sent to the server, because it contains a call to System.out.println which is only meaningful on the client. The key point is that the batch block and intermix client and remote operations.
The trick is to partition the code into client and server portions. The server part does most of the work. It iterates over the files, identifies the large files, captures their names and deletes them. This operation is described by a batch script:
OUTPUT("A", ROOT.getName());
for (file : ROOT.listFiles())
if (file.length() > 1000) {
OUTPUT("B", file.getPath());
file.delete();
}
The ROOT variable represents the root of the service. The OUTPUT expression writes data into a buffer that will be returned to the client. The tags A and B identify this particular results. Batch scripts can be interpreted or compiled on the server for execution. One important technique is translating them to SQL for execution against a database.
The code that is left over after removing the server operations is the residual client code. It executes the script that was just defined, then processes the results by printing out the names of deleted files.
void deleteFiles(Remote dirServer) {
Forest dir = dirServer.execute(...script listed above...);
System.out.println("Scanning " + dir.getString("A"));
for (Forest file : dir.getIteration("file"))
System.out.println("Deleting " + file.getString("B"));
}
Note that the if statement has been removed, because the server only returns information on files that were actually deleted.
The following sections detail how batches are used an implemented, covering the following topics:
- Syntax and semantics of the cross-platform batch script notation.
- Format of batch results, also known as forests.
- Details on the batch statement in Java.
- The rules for service interfaces, and in particular how to define database interfaces.
- How to implement batch servers.
Batch script is a subset of JavaScript designed specifically for communicating batches of actions between a client and a server. It is important to keep in mind that programmers should (almost) never write batch script directly. Instead they write a batch block in (an extended version of) their favorite programming language, as illustrated above, and the system generates the batch script and retrieves its results in bulk. In a sense batch script is analogous to JSON, but rather than representing data it represents actions. Batch script has very restricted syntax and semantics. Batch script is also analogous to SQL, but generalized to allow invocation of arbitrary remote services, not just manipulate databases. Batch script is not meant to be a full, general-purpose programming languages.
The syntax of batch script is defined by the following BNF. This is a human-readable BNF. A machine readable implementation might have to resolve left recursion, precedence, and associativity. Batch scripts are much more restrictive than JavaScript. For example, curly braces are required for all nested statements, in for, if and function expressions. Semicolons are required to separate all statements, including after a block.
| statements | ::= | command (";" statements?)? |
| command | ::= | "for" "(" id "in" expr ")" block |
| | | "if" expr "then" block ("else" block)? | |
| | | "var" id "=" expr | |
| | | expr | |
| block | ::= | "{" statements? "}" |
| expr | ::= | un-op expr |
| | | expr bin-op expr | |
| | | expr "?" expr ":" expr | |
| | | prim | |
| bin-op | ::= | "==" | "!=" | "<=" | "<" | ">" | ">=" | "||" | "&&" | "+" | "-" | "*" | "/" | "=" |
| un-op | ::= | "!" |
| prim | ::= | literal |
| | | id | |
| | | prim "." id args? | |
| | | INPUT(string) | |
| | | OUTPUT(string, expr) | |
| | | "function" "(" ID ")" block | |
| | | "(" expr ")" | |
| literal | ::= | integer | float | string | "true" | "false" |
| args | ::= | "(" expr ("," expr)* ")" |
Batch scripts support the following capabilities:
- Invoke methods the root object, or any object returned by an a method on a root object or it subobjects. Only methods published by the server can be invoked.
- Simple control flow, including sequences, conditionals, and loops over a collection,
- Declare local variables and use them in computations
- Read input data from the client (INPUT) and write outputs to the client (OUTPUT).
- Perform primitive operations on primitive data types. There is a fixed list of supported primitive data types and operations (TODO: create a table of them).
- Use literal primitive values
- Define first-class functions, which can be passed to server routines to filter or aggregate collections.
Batch scripts on their own cannot:
- Invoke static methods.
- Lookup classes or call class constructors. A service can define factory methods as part of its interface, so object construction is possible.
- No "while" loops, because they are unbounded.
- Handle exceptions.
- Return a handle or proxy to a server object. A service can hand out user-defined object handles if desired. They should be created with a time-out, however.
The abstract syntax of batch script is defined by the BatchFactory interface.
public interface BatchFactory<E> {
E Var(String name); // variable reference
E Data(Object value); // simple constant (number, string, or date)
E Fun(String var, E body); // first-class function
E Prim(Op op, List<E> args); // unary and binary operators
E Prop(E base, String field); // field access
E Assign(E target, E source); // assignment
E Let(String var, E expression, E body); // local variable
E If(E condition, E thenExp, E elseExp); // conditional
E Loop(String var, E collection, E body); // loop
E Call(E target, String method, List<E> args); // method invocation
E In(String location); // reading and writing forest
E Out(String location, E expression); // reading and writing forest
}
Batch script is a form of "mobile code" but not in the sense that the phrase is normally understood. In particular batch scripts are independent of any bytecode format and data representation.
The following primitive operations can be performed on remote values
| Operator | Arguments | |
| + | Number* --> Number
Date, Duration --> Date Duration, Duration --> Duration Primitive*, String, Primitive* --> |
/b> |
| - | Number* --> Number
Date, Date --> Duration Date, Duration --> Date Duration, Duration --> Duration |
|
| \times, \div | Number* --> Number | |
| =, \not= | X, X --> Boolean | |
| <, >, <=, >= | Number, Number --> Boolean
Date, Date --> Boolean Duration, Duration --> Boolean String, String --> Boolean |
|
| \vee, \wedge | Boolean* --> Boolean | |
| \neg | Boolean --> Boolean | |
| ; (SEQ) | Void* --> Void | |
| AVG, Min, Max | Number* --> Number | |
| Length | String --> Integer
RawData --> Integer |
|
| Substring | String, Integer, Integer --> Integer |
Data type mapping:
| Void | void | |
| String | String, enum | |
| Integer | byte, short, int, long | |
| Float | float, double | |
| Decimal | = | java.math.BigDouble |
| Date | java.sql.Date | |
| DateTime | java.sql.Timestamp, java.util.Date | |
| Duration | java.sql.Time, javax.xml.datatype.Duration | |
| RawData | byte[] |
This section will give more details about the "batch" statement and how the local and remote code is partitioned.
-
Requirement for Local/Remote/Local
In Java, a service can be specified by a collection of interfaces and/or abstract classes with only abstract members. Abstract classes are useful because they can include fields. Fields in a service object are virtual, in that they are frequently implemented via methods on the server. But it is very convenient for programmers to use the field syntax, as in C#. Enumerations can also be used, although they are considered short-hand for string constants.
The types in the service are all types reachable from the root interface by calling methods or accessing fields. The primitive types have a fixed mapping, as defined in Section~\ref{Jaba-data}.
Methods defined in Object are not considered part of the service. A service implementor has complete control over the interface to a service object, since no methods are provided by default. As an example, consider the following service:
public abstract class TestObj {
public double process(int x);
public TestObj find(String name);
public byte[] image;
}
Although any service operation can raise an exception, service interfaces do not specify checked exceptions as in Java. Exception type declarations are considered comments on the service interface, but have no semantic significance. See Section~\ref{exceptions} for more details.
Any object or interface can be part of a service interface. Even library or system objects, like HashMap and Thread can, if desired, be included in a service. Service objects may be generic types.
One restriction is that service interfaces do not support overloading. This is particularly problematic because all service parameters are optional: a default value is passed for any method call that does not specify all parameters. [what happens if the object/interface has overloading?]
A first-class function is defined as using a function object, as in Scala:
public interface Fun{ T apply(S x); }
- Special case for "insert"
interface Service {
Person setupPerson();
String insertPerson(@NamedParam Person p);
}
interface Person {
String name;
int age;
Date birthday;
BigDecimal salary;
}
It is invoked as follows:
Person p = s.setupPerson(); p.name = "William" p.age = 40; p.birthday = new Date(2010, 1, 1); key = s.insertPerson(p);
Batch services are invoked using a batch statement. A batch statement provides access to an identifier representing the service root, whose operations can be used within the block of code. In Java, a batch is defined as a special kind of for statement. Assuming testService represents a service whose root is of type TestObj, the service can be invoked as follows:
int limit = 100;
String name = System.in.readln();
for (TestObj r : testService) {
for (TestObj x : r.items())
if (x.size() > limit) {
print( "first = " + r.process(11) );
print( "second = " + r.find(name).process(42) );
}
}
The batch statement has an identifier representing the service root. In the example above, the service root identifier is r. The batch statement refers to a service object, in this case called testService. Finally, the batch statement has a block of code, which can mix local computations and service calls.
The semantics of the batch statement is that all operations involving the remote service are executed together as a group.
All data required as inputs to the service operations are sent in bulk to the server. All outputs from the service operations are returned in bulk to the client. Once the service execution is complete, the remaining client code is executed.
Service objects cannot be accessed outside a batch block. This means that service objects cannot be assigned to local variables (or fields) or passed to local procedures.
The language should support batch procedures whose body is partitioned and included in the batch.
Jaba is a special Java compiler that supports easy service invocation from Java programs. Jaba does not make any changes to Java syntax. It does, however, modify the semantics of the Java for statement to allow it to operate over a service. As an example, invoking a TestObj service looks like this:
int input = ...;
InetAddress address = ...;
for (TestObj root : new TCPClient(address, 8080)) {
System.out.println( "result = " + root.process(input) );
window.show( root.find("horse.jpg").image );
}
The batch for statement includes both local computations (System.out.println, window.show) and service operations (root.process, root.find("horse.jpg").image). The remainder of this document describes how Jaba executes these special for statements efficiently.
\label{SQL}table attribute
column attribute
Relationships and Set interface
inverse attribute
date type mapping
batch methods for synonyms
many-to-many relationships
for
selecting (SELECT)
relationships (JOIN)
filtering (WHERE)
sorting (ORDER BY)
aggregation (COUNT, SUM, EXISTS, etc)
creation (INSERT) -- using named arguments -- retriving the new Object ID -- direct insert -- insert into a set
assignment (UPDATE)
delete (DELETE)
partitioning and ordering modifications
XML
JSON