Skip to content

Move permute optimization passes to shared transforms location (#19002)#19002

Open
mcremon-meta wants to merge 1 commit intomainfrom
export-D101459577
Open

Move permute optimization passes to shared transforms location (#19002)#19002
mcremon-meta wants to merge 1 commit intomainfrom
export-D101459577

Conversation

@mcremon-meta
Copy link
Copy Markdown
Contributor

@mcremon-meta mcremon-meta commented Apr 20, 2026

Summary:

Move 6 permute optimization passes and their shared infrastructure from
executorch/backends/cadence/aot/ to executorch/backends/transforms/ so
they can be shared between the Cadence and Arm backends without a
cross-backend dependency.

New files:

  • permute_pass_utils.py: base classes (HierarchicalInplacePassInterface,
    RemoveOrReplacePassInterface, FuseOpPairsAcrossBranchesPass) and
    utilities (get_arg, set_arg, get_transposed_dims, get_permuted_dims,
    get_shape, get_edge_overload_packet)
  • fuse_cascaded_transpose_or_permute_ops.py
  • fuse_cascaded_view_ops.py
  • fuse_transpose_or_permute_op_pairs_pass.py
  • remove_permutes_around_elementwise_ops.py
  • postpone_permute_below_squeeze_view.py
  • replace_nop_transpose_or_permute_with_view.py

The shared versions omit register_cadence_pass decorators and
cadence-specific ops from default op sets. Cadence files will subclass
these and re-add the decorators and ops.

Added OSS tests (test_permute_optimization_passes.py) for the 4 passes
that can be imported without quantized op registration:
FuseCascadedTransposeOrPermuteOps, FuseCascadedViewOps,
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, and
ReplaceNopTransposeOrPermuteWithViewPass. These run in GitHub CI via
pytest and are discovered automatically through pytest.ini testpaths.

Reviewed By: ethansfng

Differential Revision: D101459577

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Apr 20, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19002

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 6 Pending

As of commit 298b640 with merge base 8e5ec80 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync bot commented Apr 20, 2026

@mcremon-meta has exported this pull request. If you are a Meta employee, you can view the originating Diff in D101459577.

@github-actions
Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@digantdesai
Copy link
Copy Markdown
Contributor

@claude review this PR.

@claude
Copy link
Copy Markdown

claude bot commented Apr 20, 2026

Claude finished @digantdesai's task in 3m 9s —— View job


Review of PR #19002: Move permute optimization passes to shared transforms location

  • Read new shared transform files
  • Read modified cadence files (subclasses)
  • Read modified arm backend file
  • Read test file
  • Read build files (BUCK, targets.bzl)
  • Analyze CI failures
  • Post review feedback

Overall this is a well-structured refactor that cleanly separates shared permute optimization logic from cadence-specific decorators/ops. The inheritance pattern is sound. I have one blocking issue and several non-blocking items.

Blocking: Python 3.10 compatibility (typing.override)

backends/transforms/permute_pass_utils.py:17override is imported from typing, but it was only added in Python 3.12. The CI runs on Python 3.10 and fails with:

ImportError: cannot import name 'override' from 'typing' (/opt/conda/envs/py_3.10/lib/python3.10/typing.py)

This breaks all 5 unittest jobs. The fix is to import from typing_extensions instead:

from typing_extensions import override

Or conditionally:

import sys
if sys.version_info >= (3, 12):
    from typing import override
else:
    from typing_extensions import override

Fix this →

Non-blocking: Lint issues (UFMT + F401)

The linter reports several issues introduced by this PR in the cadence files:

  1. Import ordering (UFMT) in fuse_ops.py, remove_ops.py, reorder_ops.py, replace_ops.py — the new shared imports are not in the correct sorted position. Running lintrunner -a should auto-fix these.

  2. Unused imports (F401) in the cadence files — after extracting code to shared modules, several imports became unused:

    • fuse_ops.py: collections.deque, typing.Callable, compiler_utils.get_permuted_dims, compiler_utils.get_transposed_dims, ExportPass
    • remove_ops.py: dataclasses.dataclass, dataclasses.field
    • reorder_ops.py: copy, typing.cast

    These should be cleaned up to keep the code tidy.

  3. Formatting in fuse_ops.py and remove_ops.py — the set union expressions for bypass_ops and permutable_ops in the cadence subclasses need wrapping in parentheses per the formatter:

    # Current (lint fails):
    bypass_ops: set[EdgeOpOverload] = _SharedFuseTransposeOrPermuteOpPairsPass.bypass_ops | {...}
    # Expected:
    bypass_ops: set[EdgeOpOverload] = (
        _SharedFuseTransposeOrPermuteOpPairsPass.bypass_ops
        | {...}
    )

Minor observations

  1. get_shape name collisionpermute_pass_utils.py defines a get_shape(graph_module, node) function, while backends/transforms/__init__.py already exports a different get_shape(input_node) from addmm_mm_to_linear.py. These have different signatures and semantics. Since neither is re-exported from the same path at import time, this isn't a runtime conflict, but it could be confusing. Consider renaming one of them in a follow-up.

  2. Test coverage — The PR notes that 2 of the 6 passes (FuseTransposeOrPermuteOpPairsPass and RemovePermutesAroundElementwiseOps) are not tested in OSS due to quantized op registration requirements. This is a reasonable pragmatic choice, but it would be good to track adding those tests when possible.

  3. arm/process_node.py change — The identity dim_order short-circuit (if dim_order == tuple(range(len(dim_order))): return np_tensor) is a nice optimization bundled in the same PR. It's correct but logically separate from the permute pass refactor.

Summary

The refactoring approach is sound — extracting shared base classes into backends/transforms/ and having cadence subclass them to add decorators and ops. The test file is thorough for the passes it covers. The main issue to fix before merging is the Python 3.10 compatibility problem with typing.override, plus running the linter to fix import ordering and unused imports.


Copy link
Copy Markdown
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, these move makes sense. Running claude review to catch nits.

@meta-codesync meta-codesync bot changed the title Move permute optimization passes to shared transforms location Move permute optimization passes to shared transforms location (#19002) Apr 20, 2026
meta-codesync bot pushed a commit that referenced this pull request Apr 20, 2026
Summary:

Move 6 permute optimization passes and their shared infrastructure from
executorch/backends/cadence/aot/ to executorch/backends/transforms/ so
they can be shared between the Cadence and Arm backends without a
cross-backend dependency.

New files:
- permute_pass_utils.py: base classes (HierarchicalInplacePassInterface,
  RemoveOrReplacePassInterface, FuseOpPairsAcrossBranchesPass) and
  utilities (get_arg, set_arg, get_transposed_dims, get_permuted_dims,
  get_shape, get_edge_overload_packet)
- fuse_cascaded_transpose_or_permute_ops.py
- fuse_cascaded_view_ops.py
- fuse_transpose_or_permute_op_pairs_pass.py
- remove_permutes_around_elementwise_ops.py
- postpone_permute_below_squeeze_view.py
- replace_nop_transpose_or_permute_with_view.py

The shared versions omit register_cadence_pass decorators and
cadence-specific ops from default op sets. Cadence files will subclass
these and re-add the decorators and ops.

Added OSS tests (test_permute_optimization_passes.py) for the 4 passes
that can be imported without quantized op registration:
FuseCascadedTransposeOrPermuteOps, FuseCascadedViewOps,
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, and
ReplaceNopTransposeOrPermuteWithViewPass. These run in GitHub CI via
pytest and are discovered automatically through pytest.ini testpaths.

Reviewed By: ethansfng

Differential Revision: D101459577
@meta-codesync meta-codesync bot force-pushed the export-D101459577 branch from 1d774d3 to a7beb8b Compare April 20, 2026 23:22
meta-codesync bot pushed a commit that referenced this pull request Apr 20, 2026
Summary:

Move 6 permute optimization passes and their shared infrastructure from
executorch/backends/cadence/aot/ to executorch/backends/transforms/ so
they can be shared between the Cadence and Arm backends without a
cross-backend dependency.

New files:
- permute_pass_utils.py: base classes (HierarchicalInplacePassInterface,
  RemoveOrReplacePassInterface, FuseOpPairsAcrossBranchesPass) and
  utilities (get_arg, set_arg, get_transposed_dims, get_permuted_dims,
  get_shape, get_edge_overload_packet)
- fuse_cascaded_transpose_or_permute_ops.py
- fuse_cascaded_view_ops.py
- fuse_transpose_or_permute_op_pairs_pass.py
- remove_permutes_around_elementwise_ops.py
- postpone_permute_below_squeeze_view.py
- replace_nop_transpose_or_permute_with_view.py

The shared versions omit register_cadence_pass decorators and
cadence-specific ops from default op sets. Cadence files will subclass
these and re-add the decorators and ops.

Added OSS tests (test_permute_optimization_passes.py) for the 4 passes
that can be imported without quantized op registration:
FuseCascadedTransposeOrPermuteOps, FuseCascadedViewOps,
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, and
ReplaceNopTransposeOrPermuteWithViewPass. These run in GitHub CI via
pytest and are discovered automatically through pytest.ini testpaths.

Reviewed By: ethansfng

Differential Revision: D101459577
@meta-codesync meta-codesync bot force-pushed the export-D101459577 branch from a7beb8b to ec8679d Compare April 20, 2026 23:42
Summary:

Move 6 permute optimization passes and their shared infrastructure from
executorch/backends/cadence/aot/ to executorch/backends/transforms/ so
they can be shared between the Cadence and Arm backends without a
cross-backend dependency.

New files:
- permute_pass_utils.py: base classes (HierarchicalInplacePassInterface,
  RemoveOrReplacePassInterface, FuseOpPairsAcrossBranchesPass) and
  utilities (get_arg, set_arg, get_transposed_dims, get_permuted_dims,
  get_shape, get_edge_overload_packet)
- fuse_cascaded_transpose_or_permute_ops.py
- fuse_cascaded_view_ops.py
- fuse_transpose_or_permute_op_pairs_pass.py
- remove_permutes_around_elementwise_ops.py
- postpone_permute_below_squeeze_view.py
- replace_nop_transpose_or_permute_with_view.py

The shared versions omit register_cadence_pass decorators and
cadence-specific ops from default op sets. Cadence files will subclass
these and re-add the decorators and ops.

Added OSS tests (test_permute_optimization_passes.py) for the 4 passes
that can be imported without quantized op registration:
FuseCascadedTransposeOrPermuteOps, FuseCascadedViewOps,
PostponePermuteOpBelowSqueezeOrUnsqueezeLikeView, and
ReplaceNopTransposeOrPermuteWithViewPass. These run in GitHub CI via
pytest and are discovered automatically through pytest.ini testpaths.

Reviewed By: ethansfng

Differential Revision: D101459577
@meta-codesync meta-codesync bot force-pushed the export-D101459577 branch from ec8679d to 298b640 Compare April 21, 2026 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants