Based on this thread. We should discuss whether we want to allow people to create custom evaluation procedures (e.g., “30 times 3-fold Crossvalidation”) and how we want to support this. Possible concerns include spotty availability (might not support the entire tasktype x procedure matrix) and splintering experimental results. But there are of course also obvious benefits :)