-
Notifications
You must be signed in to change notification settings - Fork 34
Support for MaxPool1D and RQSConv1D for PULPOpen #146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
Conversation
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds 1D MaxPool support across Generic and PULPOpen targets (kernels, templates, bindings, tiler, tile constraints, parser, and platform mapping) and updates CI workflows to reference the Regular_2D MaxPool variant for tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Parser as Parser\n(MaxPool1D/2D)
participant Mapper as Mapper\n(NodeMapper)
participant Tiler as Tiler\n(TileConstraint)
participant CodeGen as Templates\n(Codegen)
participant Kernel as Kernel\n(Target lib)
Parser->>Mapper: produce operatorRepresentation (1D or 2D)
Mapper->>Tiler: request tiling-ready bindings selection
Tiler->>CodeGen: serialize tiling solution & provide variable replacements
CodeGen->>Kernel: emit kernel call (MaxPool1d or MaxPool2d)
Kernel-->>CodeGen: link to target implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@Deeploy/Targets/PULPOpen/Templates/MaxPoolTemplate.py`:
- Around line 34-46: The template PULPMaxPool1D_8_Template/PULPMaxPoolTemplate
currently passes ONNX 1D pads into the HEIGHT pad slots of pulp_nn_maxpool (so
padding_top/padding_bottom get ${padding_y}/${padding_y_right}); swap the pad
ordering so height pads are zeros and width pads receive the ONNX values: set
padding_top=0, padding_bottom=0 and padding_left=${padding_y},
padding_right=${padding_y_right} in the pulp_nn_maxpool${signatureString} call
(i.e. move the two zeros before the two ${padding_y} args).
In `@Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py`:
- Around line 840-844: The effectiveInputLengthVar calculation is wrong: it
subtracts (inputLengthVar - 1) for the tiled case but should subtract the kernel
span (weightLengthVar - 1); update the expression in ConvTileConstraint (the
effectiveInputLengthVar assignment) to replace the final term ((inputLengthVar -
1) * (inputLengthVar != inputBuffer.shape[1])) with ((weightLengthVar - 1) *
(inputLengthVar != inputBuffer.shape[1])) so the receptive field overlap uses
weightLengthVar consistently with the 2D implementation.
- Around line 871-873: The effectiveInputLength calculation in
addPolicyConstraint uses (inputLengthVar - 1) where it should subtract the
kernel length minus one (same bug as in addGeometricalConstraint); update the
ternary expression so the last term uses the kernel size (e.g., kernelLength or
kernelLengthVar) minus one instead of inputLengthVar - 1, i.e., change "-
(inputLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1])" to "-
(kernelLength - 1) * (inputLengthVar != inputBuffer.shape[1])" in the
effectiveInputLength assignment inside addPolicyConstraint.
In `@TargetLibraries/Generic/src/MaxPool_s8.c`:
- Around line 52-69: The MaxPool1d_s8_s8 function currently computes L_out = (L
- K) / S + 1 which can underflow when K > L and divide-by-zero when S == 0;
before calculating L_out, add guards: if S == 0 return (or otherwise abort) and
if K > L set L_out = 0 (or return) to avoid the underflow/OOB behavior; update
the function entry (MaxPool1d_s8_s8) to validate S and the relationship of K and
L and bail out or produce zero-length output before any loops that use L_out.
🧹 Nitpick comments (5)
TargetLibraries/Generic/inc/kernel/MaxPool.h (1)
42-48: Consider aligning parameter naming between int8 and fp32 variants.The
MaxPool1d_s8_s8function usesLfor the spatial dimension whileMaxPool1d_fp32_fp32usesW. For consistency across the API, consider using the same parameter name (e.g.,Lfor "Length" in 1D operations) in both declarations.Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py (4)
618-618: Remove unused variabledilation.The
dilationvariable is assigned but never used inaddGeometricalConstraint. If dilation support is intended for future use, consider adding a TODO comment; otherwise, remove it.♻️ Suggested fix
- dilation = parseDict["dilations"][0] + # Note: dilation not currently used in 1D constraint; add if needed
651-658: Remove unused variablepadsinaddPolicyConstraint.The
padsvariable is assigned on line 657 but never used in this method.♻️ Suggested fix
inputBuffer = ctxt.lookup(name = parseDict['data_in']) weightBuffer = ctxt.lookup(name = parseDict['weight']) - pads = parseDict["pads"] stride = parseDict["strides"][0]
768-769: Consider addingstrict=Truetozip()calls.Using
zip()withoutstrict=Truemay silently truncate if the input lists have different lengths, potentially masking bugs.♻️ Suggested fix
- for a, b, add, mul in zip(inputInCubes, inputWeightCubes, inputAddCubes, inputMulCubes): + for a, b, add, mul in zip(inputInCubes, inputWeightCubes, inputAddCubes, inputMulCubes, strict=True):
1046-1056: Consider addingstrict=Truetozip()calls.Same recommendation as in
RQConv1DTileConstraintfor defensive programming.♻️ Suggested fix
if varBias == "NULL": - for a, b in zip(inputInCubes, inputWeightCubes): + for a, b in zip(inputInCubes, inputWeightCubes, strict=True): inputLoadSchedule.append({"data_in": a, "weight": b}) else: - for a, b, c in zip(inputInCubes, inputWeightCubes, inputBiasCubes): + for a, b, c in zip(inputInCubes, inputWeightCubes, inputBiasCubes, strict=True): inputLoadSchedule.append({"data_in": a, "weight": b, "bias": c})
| effectiveInputLengthVar = inputLengthVar \ | ||
| + (sum(pads) * (inputLengthVar == inputBuffer.shape[1])) \ | ||
| - ((inputLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1])) | ||
|
|
||
| _outputLengthVar = (effectiveInputLengthVar - dilation * (weightLengthVar - 1) - 1) // stride + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential bug in effectiveInputLengthVar formula.
The formula subtracts (inputLengthVar - 1) when tiling, but comparing with Conv2DTileConstraint.addGeometricalConstraint (lines 305-308), the 2D version subtracts (weightHeightVar - 1). This appears to be a copy-paste error—it should likely be (weightLengthVar - 1) to account for the kernel receptive field overlap.
🐛 Suggested fix
# Assume worst case scenario (data padding on all sides) when tiling on a ceratin dimension.
effectiveInputLengthVar = inputLengthVar \
+ (sum(pads) * (inputLengthVar == inputBuffer.shape[1])) \
- - ((inputLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1]))
+ - ((weightLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1]))🤖 Prompt for AI Agents
In `@Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py` around lines
840 - 844, The effectiveInputLengthVar calculation is wrong: it subtracts
(inputLengthVar - 1) for the tiled case but should subtract the kernel span
(weightLengthVar - 1); update the expression in ConvTileConstraint (the
effectiveInputLengthVar assignment) to replace the final term ((inputLengthVar -
1) * (inputLengthVar != inputBuffer.shape[1])) with ((weightLengthVar - 1) *
(inputLengthVar != inputBuffer.shape[1])) so the receptive field overlap uses
weightLengthVar consistently with the 2D implementation.
| effectiveInputLength = inputLengthVar \ | ||
| + (sum(pads) * (inputLengthVar == inputBuffer.shape[1])) \ | ||
| - (inputLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same formula issue in addPolicyConstraint.
This has the same issue as addGeometricalConstraint—it subtracts (inputLengthVar - 1) instead of the kernel length minus one.
🐛 Suggested fix
effectiveInputLength = inputLengthVar \
+ (sum(pads) * (inputLengthVar == inputBuffer.shape[1])) \
- - (inputLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1])
+ - (weightLengthVar - 1) * (inputLengthVar != inputBuffer.shape[1])🤖 Prompt for AI Agents
In `@Deeploy/Targets/PULPOpen/TileConstraints/ConvTileConstraint.py` around lines
871 - 873, The effectiveInputLength calculation in addPolicyConstraint uses
(inputLengthVar - 1) where it should subtract the kernel length minus one (same
bug as in addGeometricalConstraint); update the ternary expression so the last
term uses the kernel size (e.g., kernelLength or kernelLengthVar) minus one
instead of inputLengthVar - 1, i.e., change "- (inputLengthVar - 1) *
(inputLengthVar != inputBuffer.shape[1])" to "- (kernelLength - 1) *
(inputLengthVar != inputBuffer.shape[1])" in the effectiveInputLength assignment
inside addPolicyConstraint.
Victor-Jung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ciao Alex, I have a few minor comments but overall it looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/Targets/Generic/Parsers.py (1)
193-200: PotentialKeyErrorwhenstridesattribute is missing.The stride validation on line 199 accesses
node.attrs['strides']unconditionally. Since this is a list literal passed toall(), Python evaluates all elements before callingall()— no short-circuit evaluation occurs. If'strides' not in node.attrs, this will raise aKeyError.🐛 Proposed fix: move stride validation after the attribute existence check
def parseNode(self, node: gs.Node) -> bool: ret = all([ 'ceil_mode' in node.attrs, 'kernel_shape' in node.attrs, 'pads' in node.attrs, 'strides' in node.attrs, len(node.inputs) == 1, len(node.outputs) >= 1, - all([stride > 0 for stride in node.attrs['strides']]), ]) + if ret: + ret = all(stride > 0 for stride in node.attrs['strides']) + if ret: self.operatorRepresentation['ceil_mode'] = node.attrs['ceil_mode']
🤖 Fix all issues with AI agents
In `@TargetLibraries/Generic/src/MaxPool_s8.c`:
- Around line 55-58: The code lacks a guard for K == 0 which allows a
zero-length kernel and produces invalid outputs; update the input validation in
MaxPool_s8 (the block checking L, K, S) to treat K == 0 as invalid—mirror the
existing stride guard (S == 0) by returning (or asserting) when K == 0 (e.g.,
add K == 0 to the condition that currently checks L < K || S == 0) so the
computation of L_out is never performed with a zero kernel.
|
I handled both conditions directly in the C kernel by early-returning when the parameters are invalid. I couldn’t find an existing pattern for this in other kernels, so this was a best-effort approach on my side — please let me know if there’s a preferred or more idiomatic way to handle it. Should I apply the same kind of guard to MaxPool2d as well? Additionally, since the stride value is available from ONNX, I added a guard in the parser to catch stride == 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
Deeploy/Targets/Generic/Parsers.py (1)
195-211: KeyError whenstridesattribute is missing.The stride validation on line 202 accesses
node.attrs['strides']unconditionally. Sinceall([...])evaluates all list elements before callingall(), this will raise aKeyErrorif the'strides'attribute is not present—even though the check'strides' in node.attrsexists earlier in the list.🐛 Proposed fix: Move stride validation after initial checks
def parseNode(self, node: gs.Node) -> bool: ret = all([ 'ceil_mode' in node.attrs, 'kernel_shape' in node.attrs, 'pads' in node.attrs, 'strides' in node.attrs, len(node.inputs) == 1, len(node.outputs) >= 1, - all([stride > 0 for stride in node.attrs['strides']]), ]) - if ret: + if ret and not all(stride > 0 for stride in node.attrs['strides']): + return False + + if ret: self.operatorRepresentation['ceil_mode'] = node.attrs['ceil_mode'] self.operatorRepresentation['pads'] = node.attrs['pads'] self.operatorRepresentation['kernel_shape'] = node.attrs['kernel_shape'] self.operatorRepresentation['strides'] = node.attrs['strides'] return ret
🤖 Fix all issues with AI agents
In `@TargetLibraries/Generic/src/MaxPool_fp32.c`:
- Around line 46-50: The guard that validates input parameters in MaxPool_fp32.c
should also treat a kernel size of zero as invalid; update the condition
currently checking "if (W < K || S == 0)" to include "K == 0" so the function
returns early for empty kernels (this affects the validation in the function
containing the W_out calculation and the loop "for (uint32_t c = 0; c < C;
++c)"). Ensure the check prevents proceeding when K == 0 to avoid leaving max
initialized at -inf and producing invalid outputs.
♻️ Duplicate comments (1)
TargetLibraries/Generic/src/MaxPool_s8.c (1)
55-58: Still missing guard forK == 0.As flagged in a previous review, a zero-length kernel bypasses the current guard, causes the inner loop to never execute, and writes
-128 + output_offsetto all output positions. AddK == 0to the validation.Proposed fix
- if (L < K || S == 0) { + if (K == 0 || S == 0 || L < K) { return; }
Sorry, I missed the guard statement 😁 Sure, if you can add the guard to the other maxpool kernels, it would be amazing! Thanks for the contribution. I am approving this PR, and we can merge it soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@Deeploy/Targets/Generic/Parsers.py`:
- Around line 195-203: The current boolean expression assigned to ret eagerly
accesses node.attrs['strides'] and can raise KeyError; modify the condition so
the positivity check runs only when 'strides' is present (e.g., replace the
inner all([stride > 0 for stride in node.attrs['strides']]) with a
short-circuiting check like ('strides' in node.attrs and all(stride > 0 for
stride in node.attrs['strides'])) or restructure the overall predicate to use
generator expressions that only evaluate node.attrs['strides'] after confirming
'strides' in node.attrs); update the expression that sets ret (and reference
node.attrs, 'strides', and the ret assignment) accordingly so missing strides
yields False without an exception.
In `@TargetLibraries/Generic/src/MaxPool_fp32.c`:
- Around line 14-16: The early-return in MaxPool_fp32.c currently checks if (H <
P || W < Q || SP == 0 || SQ == 0) but misses guarding P == 0 || Q == 0, which
allows empty pooling windows to leave max as -inf; update the condition in the
MaxPool_fp32 function to also return when P == 0 or Q == 0 (i.e., include P == 0
|| Q == 0 alongside the existing H, W, SP, SQ checks) so no invalid -inf outputs
are written.
🧹 Nitpick comments (1)
TargetLibraries/Generic/src/MaxPool_s8.c (1)
13-15: Add guards forP == 0andQ == 0.For consistency with the 1D variant feedback, a zero kernel size (
P == 0orQ == 0) would cause the inner loops to never execute, leavingmax = -128and producing semantically incorrect output. Consider guarding these cases as well.♻️ Proposed fix
- if (H < P || W < Q || SP == 0 || SQ == 0) { + if (P == 0 || Q == 0 || SP == 0 || SQ == 0 || H < P || W < Q) { return; }
| ret = all([ | ||
| 'ceil_mode' in node.attrs, 'kernel_shape' in node.attrs, 'pads' in node.attrs, 'strides' in node.attrs, | ||
| 'ceil_mode' in node.attrs, | ||
| 'kernel_shape' in node.attrs, | ||
| 'pads' in node.attrs, | ||
| 'strides' in node.attrs, | ||
| len(node.inputs) == 1, | ||
| len(node.outputs) >= 1 | ||
| len(node.outputs) >= 1, | ||
| all([stride > 0 for stride in node.attrs['strides']]), | ||
| ]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid KeyError when strides is absent (Line 195–203).
all([...]) eagerly evaluates the list, so node.attrs['strides'] will throw if the attribute is missing; previously this would just return False. Guard the access and run the positivity check only when strides is present.
🛠️ Proposed fix
- ret = all([
- 'ceil_mode' in node.attrs,
- 'kernel_shape' in node.attrs,
- 'pads' in node.attrs,
- 'strides' in node.attrs,
- len(node.inputs) == 1,
- len(node.outputs) >= 1,
- all([stride > 0 for stride in node.attrs['strides']]),
- ])
+ strides = node.attrs.get('strides')
+ ret = all([
+ 'ceil_mode' in node.attrs,
+ 'kernel_shape' in node.attrs,
+ 'pads' in node.attrs,
+ strides is not None,
+ len(node.inputs) == 1,
+ len(node.outputs) >= 1,
+ ])
+ if ret and not all(stride > 0 for stride in strides):
+ return False🤖 Prompt for AI Agents
In `@Deeploy/Targets/Generic/Parsers.py` around lines 195 - 203, The current
boolean expression assigned to ret eagerly accesses node.attrs['strides'] and
can raise KeyError; modify the condition so the positivity check runs only when
'strides' is present (e.g., replace the inner all([stride > 0 for stride in
node.attrs['strides']]) with a short-circuiting check like ('strides' in
node.attrs and all(stride > 0 for stride in node.attrs['strides'])) or
restructure the overall predicate to use generator expressions that only
evaluate node.attrs['strides'] after confirming 'strides' in node.attrs); update
the expression that sets ret (and reference node.attrs, 'strides', and the ret
assignment) accordingly so missing strides yields False without an exception.
| if (H < P || W < Q || SP == 0 || SQ == 0) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard P == 0 || Q == 0 to avoid producing -inf outputs.
Consistent with the 1D guard feedback and the existing checks, if either pool dimension is zero the kernel window is empty and max remains -inf, writing invalid values. Add P == 0 || Q == 0 to the early-return condition.
🩹 Proposed fix
- if (H < P || W < Q || SP == 0 || SQ == 0) {
+ if (H < P || W < Q || P == 0 || Q == 0 || SP == 0 || SQ == 0) {
return;
}🤖 Prompt for AI Agents
In `@TargetLibraries/Generic/src/MaxPool_fp32.c` around lines 14 - 16, The
early-return in MaxPool_fp32.c currently checks if (H < P || W < Q || SP == 0 ||
SQ == 0) but misses guarding P == 0 || Q == 0, which allows empty pooling
windows to leave max as -inf; update the condition in the MaxPool_fp32 function
to also return when P == 0 or Q == 0 (i.e., include P == 0 || Q == 0 alongside
the existing H, W, SP, SQ checks) so no invalid -inf outputs are written.
Add integer MaxPool1D for Generic platform and RQSConv1D support for PULPOpen, with corresponding kernel tests.
Added
Changed
Fixed
PR Merge Checklist
develcommit and pointing todevel.CHANGELOG.mdfile has been updated.