Things to do:
- treat
missing as a special value that is not pooled, probably with level 0. This would work the same as in CategoricalArrays.jl; the benefit is that two PooledArrays differing only in the fact if they allow Missing or not could share pool
- add locking for
setindex! but make sure that we support batch operations of adding levels (both in setindex! and in e.g. copyto!); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentatively unsafe_setindex! would be an alternative that does not use lock
- stress in documentation that using
invpool is not safe if potentially other threads are modifying it (this should not be a problem)
- add
droplevels! to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)
I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.
@nalimilan + @quinnj : any additional comments on this?
Things to do:
missingas a special value that is not pooled, probably with level0. This would work the same as in CategoricalArrays.jl; the benefit is that twoPooledArraysdiffering only in the fact if they allowMissingor not could share poolsetindex!but make sure that we support batch operations of adding levels (both insetindex!and in e.g.copyto!); this will allow to fully drop Copy-On-Write and never copy pool and invpool by default; tentativelyunsafe_setindex!would be an alternative that does not use lockinvpoolis not safe if potentially other threads are modifying it (this should not be a problem)droplevels!to DataAPI.jl and to PooledArrays.jl (this requires also a change in CategoricalArrays.jl); this function would reduce pool and invpool to only used levels and also at the same time make a fresh copy of them (as a way to detach pool and invpool between PooledArray-s)I think this design is better than global pool. It will still cost us a bit in H2O benchmarks, but at least we avoid a global pool that is not reclaimable.
@nalimilan + @quinnj : any additional comments on this?