-
Notifications
You must be signed in to change notification settings - Fork 474
Prevents iterator conflicts #6040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2.1
Are you sure you want to change the base?
Conversation
This commit adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This commit adds iterator conflict checks to: - Scanner.addScanIterator - TableOperations.setProperty - TableOperations.modifyProperties - NewTableConfiguration.attachIterator Note that this does not add conflict checks to NamespaceOperations.setProperty or NamespaceOperations.modifyProperties, these will be done in another commit. This commit also accounts for the several ways in which conflicts can arise: - Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties) - Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties) - Conflicts with default table iterators (if the table has them) - Adding the exact iterator already present should not fail This commit also adds a new IteratorConflictsIT to test all of the above. Part of apache#6030
Adds conflict checks to: - NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace) - NamespaceOperations.setProperty (check conflicts with namespace iterators and all tables in the namespace) - NamespaceOperations.modifyProperties (check conflicts with namespace iterators and all tables in the namespace) New tests to IteratorConflictsIT to test the above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From running sunny day tests and all the tests I have changed in this PR, noticed that I unknowingly added new permission requirements to at least TableOperations.create() (new required permission ALTER_NAMESPACE) and Scanner.addScanIterator() (new required permission ALTER_TABLE). I imagine this is a blocker for these changes at this point, but let me know if it's not. I'll look into an alternative to avoid these permissions. See changes to ConditionalWriterIT, ScanIteratorIT, and ShellServerIT for examples of the failures I encountered.
Checks are now done server side as of cb2eccb, avoiding these permission requirements.
core/src/main/java/org/apache/accumulo/core/client/admin/NewTableConfiguration.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/client/admin/NewTableConfiguration.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/IteratorConflictsIT.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/client/admin/NewTableConfiguration.java
Outdated
Show resolved
Hide resolved
| TableOperationsHelper.checkIteratorConflicts(noDefaultsPropMap, setting, scopes); | ||
| TableOperationsHelper.checkIteratorConflicts(propertyMap, setting, scopes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could remove noDefaultsPropMap since I pushed the check for equality into checkIteratorConflicts
| String valStr = String.format("%s,%s", setting.getPriority(), setting.getIteratorClass()); | ||
| Map<String,String> optionConflicts = new TreeMap<>(); | ||
| // skip if the setting is present in the map... not a conflict if exactly the same | ||
| if (props.containsKey(nameStr) && props.get(nameStr).equals(valStr) | ||
| && IteratorConfigUtil.containsSameIterOpts(props, setting, optStr)) { | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method is the same as before except the addition of "valStr" and this if check.
Moved here since same code was used for TableOperationsHelper.checkIteratorConflicts and NamespaceOperationsHelper.checkIteratorConflicts.
|
Transferring to WIP until I resolve #6040 (review) |
core/src/main/java/org/apache/accumulo/core/clientImpl/ClientContext.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/client/admin/NewTableConfiguration.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/client/admin/NewTableConfiguration.java
Outdated
Show resolved
Hide resolved
|
Discussed iterator conflicts today, and here's a summary of some key points:
|
- Moves the iterator conflict check for create table from client side to server side. - Checking if iterators added to scanner conflict with those already set on the table moved from client side to server side. - Adds iterator conflict checks to CloneConfiguration.Builder.setPropertiesToSet. This check is done server side. - Adds testing to IteratorConflictsIT for CloneConfiguration.Builder.setPropertiesToSet
| assertThrows(exceptionClass, iterPrioConflictExec); | ||
| assertThrows(exceptionClass, iterNameConflictExec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed exception message check at least for now. Need #6048 for server to throw exception back to user for CREATE_TABLE and CLONE_TABLE. Similar issue for scanner exceptions.
| // iterator options. | ||
|
|
||
| // First ensure the set iterators do not conflict with the existing table iterators. | ||
| for (var scanParamIterInfo : scanParams.getSsiList()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is doing a lot of work and I suspect it could be problematic for small scans w/ a few iterators.
- Parsed iterator configuration is unparsed and then parsed.
- For each scan iterator it loops over all tablet iterators. So this seems like
O(M*N)type behavior. The unparsing and parsing is doneM*Ntimes.
The ParsedIteratorConfig class was created to cache parsed table config because it was observed (using profiling) for small scans that significant time was spent parsing the table config. ParsedIteratorConfig is automatically cached per table and only recreated when table config changes, this avoid each scan having to do redundant work of parsing the properties.
Avoiding the O(M*N) work and avoiding unparsing the data would probably make this much faster. Not exactly sure how to solve this puzzle exactly, but I suspect the following refactor might help.
- Modify the validation code to work on parsed iterator configuration. Currently
checkIteratorConflictsparses and validates all together. Maybe it could be refactored to takeList<IteratorSetting>instead ofMap<String,String> props. This may make the code easier to understand. - With the above change checking for iterator conflicts would parse in one method and then check/validate in another method.
- If we had a
checkIteratorConflictsmethod that took parsed config, then the scan code could call this directly with its existing parsed iterator config. This would avoid unparsing and then reparsing the data.
The above might be a good general improvement to the code, but not completely sure it solves the problem. Also not sure if will completely solve the O(M*N) problem.
Also curious if the validation could efficiently be done in the existing IteratorConfigUtil.mergeIteratorConfig() code, but not sure about that. Suspect having checkIteratorConflicts work on parsed config would make all of this code easier to understand and more efficient, so that may help answer questions like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a refactoring like the following could help speed up the scan code. Maybe Map<String, IteratorSetting> parsed will help avoid redundant parsing work as each scan iterator is checked.
public static void checkIteratorConflicts(Map<String, IteratorSetting> parsed, IteratorSetting settings) throws AccumuloException {
// parsed is keyed on IteratorSetting.name
var existing = parsed.get(settings.getName());
if(existing != null) {
// TODO check for conflicts like the current code does on unparsed config
}
}
public static void checkIteratorConflicts(Map<String,String> props, IteratorSetting setting,
EnumSet<IteratorScope> scopes) throws AccumuloException{
for (IteratorScope scope : scopes) {
Map<String, IteratorSetting> parsed = parseIteratorConfig(props, scope);
checkIteratorConflicts(parsed, setting);
}
}For performance, probably only the scan code really matters when refactoring this code. Do not really care about the performance of this code for setting iterators on a table or something like that. Nothing else will be executed as frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point in configuring the validation code to adhere best to scans.
Pushed f422862 to address your suggestion. Let me know your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure if we can avoid the O(N*M) work all together, but it does less work in the loops now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also curious if the validation could efficiently be done in the existing IteratorConfigUtil.mergeIteratorConfig()
Unfortunately not since there is no looping done here that we can add to to check for conflicts. This method loops through table iterator options and the scan parameters iterator options. To check for conflicts we need to iterate through the iterator infos.
|
@kevinrr888 why did you choose to do this work in 2.1 instead of in main? Seems there is chance if introducing new bugs in scans or compactions. Also may make config that used to work stop working (that is probably a good thing overall as it can help detect existing problems, but could introduce temporary pain). I am not opposed to making this change in 2.1, but was just curious. |
@keith-turner I had already started this work in 2.1 with #5990 thinking this was a one-off issue with NewTableConfiguration. I did not anticipate follow on work requiring changes in as many areas, so continued with 2.1 learning the scope of the issue as I went. I also thought this validation would be good to have in the earliest version possible since it is essentially a bug. I would be fine refactoring this for main if we think this is too risky or undesired for 2.1. |
There are benefits and risk with this change. Maybe the best way to get the benefits and lower the risk is to make these changes only warn in 2.1 and fail in 4.0? That way things that were working in 2.1.4 and earlier do not blow up in 2.1.5, but still work and get a warning that iterator config is not correct and could lead to non-deterministic behavior. |
This is good with me, I'll change this |
Also fixed a bug where I was calling regex.matches(str) instead of str.matches(regex)
This adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This adds iterator conflict checks to:
This also accounts for the several ways in which conflicts can arise:
This commit also adds a new IteratorConflictsIT to test all of the above.
Part of #6030