[VL] Fix gflags dual-registration abort on macOS arm64#12100
Open
jackylee-ch wants to merge 1 commit into
Open
[VL] Fix gflags dual-registration abort on macOS arm64#12100jackylee-ch wants to merge 1 commit into
jackylee-ch wants to merge 1 commit into
Conversation
On macOS arm64, libvelox.dylib has static gflags baked in via Folly
(Velox builds Folly with -DGFLAGS_SHARED=FALSE). Without special
handling, libgluten.dylib transitively pulls dynamic gflags via the
INTERFACE_LINK_LIBRARIES of glog::glog and Folly::folly. At JVM
load time, dyld coalesces the weak C++ function-local-static guard
inside FlagRegistry::GlobalRegistry() across the two dylibs. Both
copies then register "flagfile" against the same registry and gflags'
duplicate-flag check aborts the process before any user code runs:
ERROR: flag 'flagfile' was defined more than once
(in files '.../gflags.cc' and '.../gflags.cc')
... is being linked both statically and dynamically.
Linux is unaffected because (a) ELF does not coalesce weak symbols
across .so boundaries by default, and (b) Gluten already uses
symbols.map to control libgluten.so's export surface. macOS has no
version-script equivalent, so a different mechanism is required.
This change fixes the abort end-to-end for macOS while leaving Linux
and Windows build/link semantics untouched.
1. cpp/CMake/Findglog.cmake: on Darwin, prefer the static libglog.a
and force gflags_component=static. When both archives are present
we replace the imported google::glog target with an INTERFACE
IMPORTED target whose INTERFACE_LINK_OPTIONS carry
`LINKER:-load_hidden,<libglog.a>` and
`LINKER:-load_hidden,<libgflags.a>`. -load_hidden is the Apple
ld64 flag that gives every symbol pulled from the archive hidden
visibility, preventing dyld from coalescing them across dylibs.
The static gflags archive path is resolved by inspecting
IMPORTED_LOCATION_RELEASE / _NOCONFIG / * on
gflags::gflags_static.
2. cpp/core/utils/GflagsStubDarwin.cc (new): exports a no-op
google::HandleCommandLineHelpFlags with default visibility.
Velox's archive of gflags pulls gflags.cc.o but never references
gflags_reporting.cc.o, so once -load_hidden makes the real copy
invisible, the dynamic linker would fail to resolve this symbol
at dlopen time. The stub resolves it from libgluten.dylib instead.
3. cpp/core/CMakeLists.txt: conditionally adds the stub to the
gluten target on APPLE.
4. cpp/velox/CMakeLists.txt: on Darwin, links google::glog as PUBLIC
on the velox target so its INTERFACE_LINK_OPTIONS propagate
through libvelox.dylib to test binaries and benchmarks. PRIVATE
linkage on the gluten target is intentional for Linux (symbols.map
handles it), but on Darwin Folly::folly's INTERFACE_LINK_LIBRARIES
pulls libgflags.a into libvelox.dylib and any test executables
with default visibility, reviving the same dual-registration
abort at test startup.
5. cpp/velox/compute/VeloxBackend.cc: guards
google::InitGoogleLogging with IsGoogleLoggingInitialized() and
makes VeloxBackend::create() idempotent. Multi-suite gtest
binaries on macOS re-enter VeloxBackend::init from each
SetUpTestSuite, otherwise triggering glog's "You called
InitGoogleLogging() twice!" check and Gluten's
Registry "Required object already registered" check.
Verification:
- nm -g libvelox.dylib | grep "google.*ParseCommandLine" -> empty
(gflags symbols are not exported across the dylib boundary)
- nm libvelox.dylib | grep FlagRegistry -> all lowercase t / b
(every FlagRegistry symbol is local to libvelox.dylib)
- velox_shuffle_writer_test runs 5436/5436 cases cleanly on
macOS 14 arm64 with Apple Clang 17.
- Linux x86_64 build green, no link/load behavior change.
4c19348 to
defe11e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
When loading
libgluten.dylibon macOS arm64, the JVM aborts during theSystem.loadLibrarycall with:The root cause is dyld weak-symbol coalescing across two dylibs that each
contain their own copy of gflags:
libvelox.dyliblibgflags.abaked in via Folly (Velox builds Folly with-DGFLAGS_SHARED=FALSE)libgluten.dyliblibgflags.dylibpulled transitively throughglog::glog/Folly::follyINTERFACE_LINK_LIBRARIESOn macOS, dyld coalesces the weak C++ function-local-static guard inside
FlagRegistry::GlobalRegistry()between the two dylibs. Both copies thenregister
--flagfileagainst the same registry and gflags' duplicate-flagcheck aborts the process before any user code runs.
Linux is unaffected because (a) ELF does not coalesce weak symbols across
shared objects by default, and (b) Gluten already uses
symbols.maptocontrol the export surface of
libgluten.so. macOS has no version-scriptequivalent, so this PR uses a different mechanism. All Darwin-specific
logic is gated on
APPLE/CMAKE_SYSTEM_NAME STREQUAL "Darwin"; Linuxand Windows build and link semantics are untouched.
The fix has five parts that all need to be in place to fully eliminate the
abort across the production load path and the test executables:
cpp/CMake/Findglog.cmake— On Darwin, prefer the staticlibglog.aand forcegflags_component=static. When both archivesare available we replace the imported
google::glogtarget with anINTERFACE IMPORTEDtarget whoseINTERFACE_LINK_OPTIONScarryLINKER:-load_hidden,<libglog.a>andLINKER:-load_hidden,<libgflags.a>.-load_hiddenis the Apple ld64flag that gives every symbol pulled from the archive hidden
visibility, which prevents dyld from coalescing them across dylibs.
We resolve the static gflags archive path by inspecting
IMPORTED_LOCATION_RELEASE / _NOCONFIG / *ongflags::gflags_static.cpp/core/utils/GflagsStubDarwin.cc(new) — Exports a no-opgoogle::HandleCommandLineHelpFlagswith default visibility. Velox'sarchive of gflags pulls
gflags.cc.obut never referencesgflags_reporting.cc.o, so once-load_hiddenmakes the real copyinvisible, the dynamic linker would fail to resolve this symbol at
dlopen time. The stub resolves it from
libgluten.dylibinstead.cpp/core/CMakeLists.txt— Conditionally adds the stub to theglutentarget onAPPLE.cpp/velox/CMakeLists.txt— On Darwin, linksgoogle::glogasPUBLICon theveloxtarget so itsINTERFACE_LINK_OPTIONSpropagate through
libvelox.dylibto test binaries and benchmarks.The default PRIVATE linkage on
glutenis intentional for Linux(
symbols.maphandles it), but on DarwinFolly::folly'sINTERFACE_LINK_LIBRARIESpullslibgflags.aintolibvelox.dyliband any test executables with default visibility, reviving the same
dual-registration abort at test startup.
cpp/velox/compute/VeloxBackend.cc— Guardsgoogle::InitGoogleLoggingwithIsGoogleLoggingInitialized()andmakes
VeloxBackend::create()idempotent. Multi-suite gtest binarieson macOS re-enter
VeloxBackend::initfrom eachSetUpTestSuite,otherwise triggering glog's
"You called InitGoogleLogging() twice!"check and Gluten's
Registry "Required object already registered"check.
How was this patch tested?
Built on macOS 14 arm64 with Apple Clang 17 and the Homebrew toolchain.
Symbol audit (after the fix):
All
FlagRegistrysymbols are lowercase (t= local text,b= localbss); none are exported across the dylib boundary, so dyld has nothing
to coalesce.
Behavioral validation:
dlopen("libgluten.dylib")aborts before any testreaches
main().cpp/build/velox/tests/velox_shuffle_writer_testruns5436 / 5436 cases cleanly on macOS 14 arm64.
that exercise native load without query execution) all pass on macOS
arm64:
org.apache.gluten.utils.VeloxBloomFilterTestorg.apache.gluten.columnarbatch.ColumnarBatchTestorg.apache.gluten.backendsapi.VeloxListenerApiTestorg.apache.gluten.fs.OnHeapFileSystemTestorg.apache.gluten.vectorized.ArrowColumnVectorTestcpp/buildreports 5574 / 5585 pass; the 11 failuresare unrelated upstream Velox issues exposed by the recent
dft-2026_05_13bump (HYPERLOGLOG cast registration tightening,Type::equivalent()regression on identically-printed ROW types) —not caused by this PR.
Linux:
APPLE/Darwin checks, so no behavioral change on Linux is expected. Local
Ubuntu build verified clean.
Was this patch authored or co-authored using generative AI tooling?
co-auth: Claude (Sonnet/Opus) via Claude Code 1.x