Skip to content

Conversation

@kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Jan 7, 2026

This adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This adds iterator conflict checks to:

  • Scanners at the time they are used
  • TableOperations.setProperty
  • TableOperations.modifyProperties
  • NewTableConfiguration.attachIterator
  • NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
  • NamespaceOperations.setProperty
  • NamespaceOperations.modifyProperties
  • CloneConfiguration.Builder.setPropertiesToSet

This also accounts for the several ways in which conflicts can arise:

  • Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
  • Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
  • Conflicts with default table iterators (if the table has them)
  • Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of #6030

This commit adds checks when adding an iterator that the given iterator does not conflict with any existing iterators. Conflict meaning same name or same priority. Iterators can be added several ways, and previously only TableOperations.attachIterator and NamespaceOperations.attachIterator would check for conflicts. This commit adds iterator conflict checks to:
- Scanner.addScanIterator
- TableOperations.setProperty
- TableOperations.modifyProperties
- NewTableConfiguration.attachIterator

Note that this does not add conflict checks to NamespaceOperations.setProperty or NamespaceOperations.modifyProperties, these will be done in another commit.

This commit also accounts for the several ways in which conflicts can arise:
- Iterators that are attached directly to a table (either through TableOperations.attachIterator, TableOperations.setProperty, or TableOperations.modifyProperties)
- Iterators that are attached to a namespace, inherited by a table (either through NamespaceOperations.attachIterator, NamespaceOperations.setProperty, or NamespaceOperations.modifyProperties)
- Conflicts with default table iterators (if the table has them)
- Adding the exact iterator already present should not fail

This commit also adds a new IteratorConflictsIT to test all of the above.

Part of apache#6030
Adds conflict checks to:
- NamespaceOperations.attachIterator (was previously only checking for conflicts with iterators in the namespace, now also checks for conflicts with iterators in the tables of the namespace)
- NamespaceOperations.setProperty (check conflicts with namespace iterators and all tables in the namespace)
- NamespaceOperations.modifyProperties (check conflicts with namespace iterators and all tables in the namespace)

New tests to IteratorConflictsIT to test the above
@kevinrr888 kevinrr888 added this to the 2.1.5 milestone Jan 7, 2026
@kevinrr888 kevinrr888 self-assigned this Jan 7, 2026
Copy link
Member Author

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From running sunny day tests and all the tests I have changed in this PR, noticed that I unknowingly added new permission requirements to at least TableOperations.create() (new required permission ALTER_NAMESPACE) and Scanner.addScanIterator() (new required permission ALTER_TABLE). I imagine this is a blocker for these changes at this point, but let me know if it's not. I'll look into an alternative to avoid these permissions. See changes to ConditionalWriterIT, ScanIteratorIT, and ShellServerIT for examples of the failures I encountered.

Checks are now done server side as of cb2eccb, avoiding these permission requirements.

Comment on lines -207 to +222
TableOperationsHelper.checkIteratorConflicts(noDefaultsPropMap, setting, scopes);
TableOperationsHelper.checkIteratorConflicts(propertyMap, setting, scopes);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could remove noDefaultsPropMap since I pushed the check for equality into checkIteratorConflicts

Comment on lines 382 to 388
String valStr = String.format("%s,%s", setting.getPriority(), setting.getIteratorClass());
Map<String,String> optionConflicts = new TreeMap<>();
// skip if the setting is present in the map... not a conflict if exactly the same
if (props.containsKey(nameStr) && props.get(nameStr).equals(valStr)
&& IteratorConfigUtil.containsSameIterOpts(props, setting, optStr)) {
continue;
}
Copy link
Member Author

@kevinrr888 kevinrr888 Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is the same as before except the addition of "valStr" and this if check.

Moved here since same code was used for TableOperationsHelper.checkIteratorConflicts and NamespaceOperationsHelper.checkIteratorConflicts.

@kevinrr888
Copy link
Member Author

kevinrr888 commented Jan 8, 2026

Transferring to WIP until I resolve #6040 (review)
Edit: Addressed

@kevinrr888 kevinrr888 marked this pull request as draft January 8, 2026 15:53
@ctubbsii
Copy link
Member

ctubbsii commented Jan 8, 2026

Discussed iterator conflicts today, and here's a summary of some key points:

  1. Conflict within the config: In configuration, no two iterators at the same scope (scan, minc, majc) may be able to have the same priority.
    • This applies only to the complete view of the TableConfiguration, with all inherited properties from parent configs (namespace, system, ...), so it is okay, for example, if a table config set at the namespace level is overridden in part at the table level, so that the one single iterator at that scope and priority has configuration that spans across two levels of the configs. What is important is that the resulting view of the TableConfiguration when trying to construct an iterator stack, will not show any two different iterators at the same scope with the same priority.
    • Checks could be in place when editing table/namespace configuration to ensure a priority isn't "doubled up". A user who wishes to replace iteratorA with iteratorB at the same priority would have to remove iteratorA before adding iteratorB, or would have to use modifyProperties to atomically mutate the properties to remove and add at the same time, in order to avoid an error. This alone, however, does not guarantee that there isn't a conflict. If iteratorA had been set at the namespaceN.tableT, but iteratorB was being added to namespaceN, we would have to check that there isn't a conflict with any of the tables in namespaceN. That's not exactly practical, so we may just want to check that there isn't a conflict at the level being modified, and rely on later checks when setting up the iterator stack to verify that there isn't a conflict overall.
    • Note: this probably would be easier to deconflict if our iterator configs used a different property key scheme that was more overrideable atomically, like table.<scope>.iterator.<priority>=class,opt1key=opt1val,opt2key=opt2val,.... so that it wouldn't be possible to have conflict between namespace and table configs, because one would fully override the other. But, that's not what we have today.
  2. Conflict in user-supplied iterators for a specific scan/compaction: No two iterators in a single client-initiated operation's settings may have the same priority.
    • This is for things like scan-time iterators set in the API for a scan, and passed over the RPC, rather than iterators set in configuration on the server-side.
    • This also applies to any other place where we might be able to specify iterators that aren't in the configuration (compactions, conditional mutations, etc.)
    • We could check for conflicts set on the scanner easily, but would have to rely on the server-side setting up the iterator stack to ensure no conflicts between the user iterators and those set on the table.
  3. Conflict between configured iterators and user-supplied iterators: The complete iterator stack for an operation may not have any iterators running at the same priority, regardless of whether it came from the configuration or from the client API/RPC request.
    • To address this, we can simply check the full iterator stack when it is being constructed on the server-side, and fail the operation if any priorities are reused, regardless of where they came from.
    • Alternatively, we could treat one as overriding another, but I don't think that's a very good idea.
    • As a follow-on improvement here, we could treat all configured iterators as higher priority than all client operation-specific (scan-time/compaction-time) iterators:
      1. Instead of ordering three configured iterators and two user-supplied iterators by priority alone, as in C1, U2, C3, U4, C5, we would instead order them as C1, C3, C5, U2, U4.
      2. This enables stronger security guarantees by preventing a user-supplied iterator from seeing data that is filtered out in a administrator-configured iterator.
      3. This prevents bugs that could be caused by a user-supplied iterator that transforms data in a way that a subsequent administrator-configured iterator won't be able to handle.
      4. This is a behavior change, and may break some people's (ill-advised) uses, but I think it is better overall.
      5. This would also open the possibility of having a cleaner client-side API, because you don't actually have to specify priority numbers on the client. Instead, clients only need to order user-supplied iterators with respect to other user-supplied iterators, and won't need a priority number to indicate a global ordering that includes the configured iterators for a table. So, we could have an API something like: scanner.map(iterator1).map(iterator2).map(iterator3).scan().

- Moves the iterator conflict check for create table from client side to server side.
- Checking if iterators added to scanner conflict with those already set on the table moved from client side to server side.
- Adds iterator conflict checks to CloneConfiguration.Builder.setPropertiesToSet. This check is done server side.
- Adds testing to IteratorConflictsIT for CloneConfiguration.Builder.setPropertiesToSet
@kevinrr888 kevinrr888 marked this pull request as ready for review January 13, 2026 18:49
Comment on lines +214 to +215
assertThrows(exceptionClass, iterPrioConflictExec);
assertThrows(exceptionClass, iterNameConflictExec);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed exception message check at least for now. Need #6048 for server to throw exception back to user for CREATE_TABLE and CLONE_TABLE. Similar issue for scanner exceptions.

// iterator options.

// First ensure the set iterators do not conflict with the existing table iterators.
for (var scanParamIterInfo : scanParams.getSsiList()) {
Copy link
Contributor

@keith-turner keith-turner Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is doing a lot of work and I suspect it could be problematic for small scans w/ a few iterators.

  • Parsed iterator configuration is unparsed and then parsed.
  • For each scan iterator it loops over all tablet iterators. So this seems like O(M*N) type behavior. The unparsing and parsing is done M*N times.

The ParsedIteratorConfig class was created to cache parsed table config because it was observed (using profiling) for small scans that significant time was spent parsing the table config. ParsedIteratorConfig is automatically cached per table and only recreated when table config changes, this avoid each scan having to do redundant work of parsing the properties.

Avoiding the O(M*N) work and avoiding unparsing the data would probably make this much faster. Not exactly sure how to solve this puzzle exactly, but I suspect the following refactor might help.

  1. Modify the validation code to work on parsed iterator configuration. Currently checkIteratorConflicts parses and validates all together. Maybe it could be refactored to take List<IteratorSetting> instead of Map<String,String> props. This may make the code easier to understand.
  2. With the above change checking for iterator conflicts would parse in one method and then check/validate in another method.
  3. If we had a checkIteratorConflicts method that took parsed config, then the scan code could call this directly with its existing parsed iterator config. This would avoid unparsing and then reparsing the data.

The above might be a good general improvement to the code, but not completely sure it solves the problem. Also not sure if will completely solve the O(M*N) problem.

Also curious if the validation could efficiently be done in the existing IteratorConfigUtil.mergeIteratorConfig() code, but not sure about that. Suspect having checkIteratorConflicts work on parsed config would make all of this code easier to understand and more efficient, so that may help answer questions like this.

Copy link
Contributor

@keith-turner keith-turner Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a refactoring like the following could help speed up the scan code. Maybe Map<String, IteratorSetting> parsed will help avoid redundant parsing work as each scan iterator is checked.

  public static void checkIteratorConflicts(Map<String, IteratorSetting> parsed, IteratorSetting settings) throws AccumuloException {
    // parsed is keyed on IteratorSetting.name
    var existing = parsed.get(settings.getName());
    if(existing != null) {
      // TODO check for conflicts like the current code does on unparsed config
    }
  }

  public static void checkIteratorConflicts(Map<String,String> props, IteratorSetting setting,
                                            EnumSet<IteratorScope> scopes) throws AccumuloException{
    for (IteratorScope scope : scopes) {
      Map<String, IteratorSetting> parsed = parseIteratorConfig(props, scope);
      checkIteratorConflicts(parsed, setting);
    }
  }

For performance, probably only the scan code really matters when refactoring this code. Do not really care about the performance of this code for setting iterators on a table or something like that. Nothing else will be executed as frequently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point in configuring the validation code to adhere best to scans.

Pushed f422862 to address your suggestion. Let me know your thoughts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really sure if we can avoid the O(N*M) work all together, but it does less work in the loops now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also curious if the validation could efficiently be done in the existing IteratorConfigUtil.mergeIteratorConfig()

Unfortunately not since there is no looping done here that we can add to to check for conflicts. This method loops through table iterator options and the scan parameters iterator options. To check for conflicts we need to iterate through the iterator infos.

@keith-turner
Copy link
Contributor

@kevinrr888 why did you choose to do this work in 2.1 instead of in main? Seems there is chance if introducing new bugs in scans or compactions. Also may make config that used to work stop working (that is probably a good thing overall as it can help detect existing problems, but could introduce temporary pain). I am not opposed to making this change in 2.1, but was just curious.

@kevinrr888
Copy link
Member Author

@kevinrr888 why did you choose to do this work in 2.1 instead of in main? Seems there is chance if introducing new bugs in scans or compactions. Also may make config that used to work stop working (that is probably a good thing overall as it can help detect existing problems, but could introduce temporary pain). I am not opposed to making this change in 2.1, but was just curious.

@keith-turner I had already started this work in 2.1 with #5990 thinking this was a one-off issue with NewTableConfiguration. I did not anticipate follow on work requiring changes in as many areas, so continued with 2.1 learning the scope of the issue as I went. I also thought this validation would be good to have in the earliest version possible since it is essentially a bug. I would be fine refactoring this for main if we think this is too risky or undesired for 2.1.

@keith-turner
Copy link
Contributor

I would be fine refactoring this for main if we think this is too risky or undesired for 2.1.

There are benefits and risk with this change. Maybe the best way to get the benefits and lower the risk is to make these changes only warn in 2.1 and fail in 4.0? That way things that were working in 2.1.4 and earlier do not blow up in 2.1.5, but still work and get a warning that iterator config is not correct and could lead to non-deterministic behavior.

@kevinrr888
Copy link
Member Author

make these changes only warn in 2.1 and fail in 4.0

This is good with me, I'll change this

Also fixed a bug where I was calling regex.matches(str) instead of
str.matches(regex)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants