Post the datasets

Clone the EMBO/sd-nlp repo if not yet done:

git clone https://huggingface.co/datasets/EMBO/sd-nlp
cd sd-nlp

Copy the relevant dataset files

cp ../soda-roberta/data/json/220304_sd_panels.zip ./sd_panels.zip
cp ../soda-roberta/data/json/220304_sd_fig.zip ./sd_figs.zip

Copy the loading script:

cp ../soda-roberta/smtag/loader/loader_tokcl.py sd-nlp.py

Make sure .jsonl files are tracked as lfs objects:

git lfs install
git lfs track *.zip
git add .gitattributes
git commit -m "tracking *.zip files"

Add and commit files:

git status
git add -A
git commit -m "updating data files and loader script"
git push

Provide feedback