-
Notifications
You must be signed in to change notification settings - Fork 15
Modified accumulo access to only work with String and Unicode. #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do we still need the quoting and unquoting of terms in an AccessExpression if the goal is String and Unicode only? |
Yeah still need them, the quoting allows using characters that are reserved for the language itself in auths. Also allows the language to use new characters in the future that are also used in quoted expressions. |
|
Ran the benchmark against commit dd2ef2d from this PR and saw the following. Then ran against c70d418 from main and saw the following. Improved the benchmark times with the changes in 22af52d . These changes use a char[] array instead of String internally to avoid calling with charAt() on string so frequently. Experimented w/ adding char[] methods to the public API (in addition to string methods) and those are faster when you already have a char array. However using the java CharDecoder class to create char[] is much slower than string, so its not a net win. |
| QUOTED, | ||
| /** | ||
| * Denotes that an authorization seen in a valid access expression was unquoted. This means the | ||
| * expression only contains the characters allowed in an unquoted authorization. | ||
| */ | ||
| UNQUOTED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the predicate is used to evaluate a newly constructed Authorizations, then each individual authorization string in that will never be quoted. So, the concept of quoted/unquoted doesn't make sense for that case. Further, if you assume unquoted, and always pass that enum for that case, then it prevents a user from making a more restricted Predicate that ensures all authorizations are quoted or that all of them are unquoted. A user may want such a restricted predicate, to normalize their AccessExpressions, because the quotes still affect the lexical ordering of the expressions for keys stored in Accumulo, or the efficiency of the evaluator cache (because there could be different cache entries for the equivalent access expressions that differ only by optional quotes).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where the predicate is used to evaluate a newly constructed Authorizations
Improved the naming in f8492ab for this case.
| * @see #builder() | ||
| * @since 1.0 | ||
| */ | ||
| public interface AccumuloAccess { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need "Accumulo" prefix on this object. While Accumulo is the containing project, I think this class could just be called "Access".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made that change in 778c9e4. I wanted a shorted name for this class but was not sure what name to use. Access seems good to me.
| /** | ||
| * @return a pre-allocated empty Authorizations object | ||
| */ | ||
| Authorizations newAuthorizations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could probably remove this overloaded method. A user can just pass an empty set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in 8331f63
| * valid | ||
| * @throws NullPointerException when any argument is null | ||
| */ | ||
| void findAuthorizations(String expression, Consumer<String> authorizationConsumer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would be useful to have on AccessExpression itself as a non-static method. I'm not sure if it would still be useful here or not, for the case where you don't want to construct the AccessExpression and just want to search in a String. But, I'd lean towards not being useful here. It's probably more useful only on AccessExpression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to move this out of the root level interface, however if that is done then expression would be parsed twice to do the following.
Access access = ...;
access.newExpression("a|b|c").findAuthorizations(System.out::println);It will parse once to validate when creating the AccessExpression and then again when calling findAuthorizations().
| * | ||
| * @throws NullPointerException when the argument is null | ||
| */ | ||
| String quote(String authorization) throws InvalidAuthorizationException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems a little weird to me to call this "quote" when sometimes it doesn't do any quoting, but I'm not sure of a better name, and probably wouldn't want to overload this with separate "alwaysQuote" and "quoteIfNeeded" methods. At least the Javadoc makes it clear.
| * @throws InvalidAuthorizationException if the expression contains an invalid authorization | ||
| * @throws NullPointerException when the argument is null | ||
| */ | ||
| void validate(String expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I like this here. I get the utility of having it to avoid the object allocation if you just want to quickly verify a bunch of expressions, but it feels out of place. This isn't really here to construct or use an object from this library... it's just kind of an extra utility method that allows users to evaluate their own data. But, I think what users really are asking if they were to use this method is "if I were to construct a new AccessExpression from this String, would it succeed?", and I think the only 100% reliable way to do that is to just create a new AccessExpression from this String, which kind of makes this method redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the same as calling newExpression, except it does not allocate an object. So its an optimization. Could remove it, it can be added later. Also could rename it to validateExpression() which lines up w/ the newExpression() method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to validateExpression in 7a37593. Plan to evaluate removal based on trying to use the changes in accumulo. I believe this is used now in main, but not 100% sure.
| * @param authorizations auths to use in the AccessEvaluator | ||
| * @return AccessEvaluator object | ||
| */ | ||
| AccessEvaluator newEvaluator(Authorizations authorizations); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this one is needed. It's basically a special case of the other, using a kind of "SetOfAuthorizationsAuthorizer"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to be kept because there are internal optimization that can only be done when the whole set is known. These optimization avoid string allocations by transforming the set into an internal representation. This is much faster.
| * | ||
| * </table> | ||
| */ | ||
| AccessEvaluator newEvaluator(Collection<Authorizations> authorizationSets); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If AccessEvaluator were a FunctionalInterface (which I think it can, and should, be), I think this basically becomes something like:
var allEvaluators = authorizationsSets.stream().map(this::newEvaluator).collect(Collectors.toList());
AccessEvaluator aggregateEvaluator = expression -> allEvaluators.stream().allMatch(AccessEvaluator::canAccess);
for (AccessExpression expression : allExpressionsToEvaluate) {
if (aggregateEvaluator.canAccess(expression)) {
// do whatever
}
}I think this is something a user can do themselves (using streams or traditional loops) if they had this use case. I think this is a useful example to document, but I don't think we need it in the main API entry point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method could be moved from the API to an example, opened #98 to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If AccessEvaluator were a FunctionalInterface
Could be tricky to achieve that because it could lead to forcing double parsing (once for validation and once of evaluation) depending on how its done.
core/src/main/java/org/apache/accumulo/access/impl/AuthorizationsImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/access/impl/AccessEvaluatorImpl.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/access/impl/AccessEvaluatorImpl.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/access/impl/AccessEvaluatorImpl.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/access/impl/AccessEvaluatorImpl.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/access/impl/AccessEvaluatorImpl.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this can be a Record and if there is any benefit to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, it currently does not have equals(), hashCode(), or toString(). Also, it has a mutable field that is in atomic ref. It could be a record, but the atomic ref feels like it goes against the spirit of a record.
Updates to work with changes in apache/accumulo-access#96. The new core/clientImpl/access/BytesAccess class consildates all the code to work with the new APIs and handle byte[]->String correctly.
|
Updated Accumulo to use these changes in apache/accumulo#6053 and took this out of draft. |
|
IMO we should merge this so that we can iterate on this and the changes in apache/accumulo#6053. |
Updates to work with changes in apache/accumulo-access#96. The new core/clientImpl/access/BytesAccess class consildates all the code to work with the new APIs and handle byte[]->String correctly.
This is still a work in progress, but its close to being complete. Modified the Accumulo Access code to only use String and unicode. Made the following changes.
byte[]in the APIRealized it would be really nice if users could further limit authorizations to a smaller subset of characters while working on this. The existing Accumulo Access APIs had a ton of static entry points and there was no one place that configurable behavior could be introduced . That led to reworking the API to have a single entry point where configuration could be applied instead of the many entry static points. Currently this entry point only allows setting a configurable authorization validator, but it makes it easy to support future configuration. This new single entry point is conceptually similar to how Gson works, it was inspired by that.
Still puzzling about how Accumulo might use this code or if that is possible. Want to experiment w/ that some.
Have not yet run the benchmarks w/ these code changes.
fixes #88