feat(catalog): hadoop table and namespace CRUD operations#969
feat(catalog): hadoop table and namespace CRUD operations#969tanmayrauth wants to merge 4 commits intoapache:mainfrom
Conversation
3305012 to
fb3dd76
Compare
|
@laskoviymishka @zeroshade can you please review this PR? |
| info, err := os.Stat(nsPath) | ||
| if os.IsNotExist(err) || (err == nil && !info.IsDir()) { | ||
| return nil, fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace, strings.Join(ns, ".")) | ||
| } |
There was a problem hiding this comment.
shouldn't this support customizable file systems beyond just local? i.e. shouldn't this use the io package?
There was a problem hiding this comment.
This is intentionally local-only for now to match the scoped plan (local parity with Spark's Java HadoopCatalog first). The io.IO interface doesn't currently have Stat or MkdirAll equivalents needed for directory-based namespace operations, so switching to it would require extending the interface. I'll open a follow-up issue to add something like StatableIO and refactor to use icebergio.IO throughout for HDFS/cloud support.
There was a problem hiding this comment.
Fair enough. Let's continue to use pkg.go.dev/io/fs as inspiration for any changes we make to the IO package.
zeroshade
left a comment
There was a problem hiding this comment.
LGTM just update the docstrings for NewCatalog/Catalog to specify that this only supports local filesystem for now
|
Updated the docstring. |
|
looks good, just need to resolve the conflicts! |
…p catalog Implement the three core table operations: - CreateTable: validates namespace exists, rejects custom locations, writes v1.metadata.json via temp-file+rename, updates version hint - LoadTable: uses findVersion with three-tier fallback, delegates to table.NewFromLocation for metadata parsing - CheckTableExists: delegates to isTableDir Relates to apache#798 Depends-on: PR 2 (version-hint), PR 3 (namespace-ops) Depended-on-by: PR 6 (CommitTable)
Add cross-compatibility integration tests verifying CreateTable, LoadTable, and CheckTableExists work between Go and Spark Hadoop catalogs. Pre-create the hadoop-warehouse directory before Docker compose to ensure runner ownership in CI.
Update Catalog and NewCatalog docstrings to note that only local filesystem paths are currently supported.
bd4feb8 to
82061f7
Compare
|
Resolved the conflicts. |
4: CreateTable + LoadTable + CheckTableExists
Implement the three core table operations. CreateTable validates the namespace exists, rejects custom locations, builds metadata via table.NewMetadata, writes v1.metadata.json through a temp-file-plus-rename pattern, and does a best-effort version-hint write. LoadTable calls findVersion to get the current version, builds the metadata path, and delegates to table.NewFromLocation. CheckTableExists delegates to isTableDir. Tests cover create-and-load round-trip, create with partition spec / sort order / properties, reject custom location, create in non-existent namespace, create duplicate, load non-existent, load with stale hint, and check exists true/false.
Depends on #968 #963
Relates to #798