The recent data gap with THEMIS GBO data revealed some areas where the download and processing scripts could be improved.
The primary change is that we need to switch from using wget to rsync, because the new Calgary server does not allow wget.
In the case of the download script, the actual wget calls complete relatively quickly; however, the script seems to spend a long time completing database updates, calling gbo_uc_rmd_mirror.php and updating a single file per query. We need to:
In the case of the processing script, completing the backlog took several days; this is fine on its own, but since it works by checking for files containing directory information in a single location, it interferes with routine processing for not just GBO sites, but all networks which we retrieve data via RMD files. Even if the processing script was run on a different machine, the directory files and the directory that the script checks would need to be different as well (to prevent the routine RMD processing script from trying to process the same files).
Additionally, processing a single month's worth of data seems to take a long time, but the exact reason for why this is is currently unknown. We need to:
The recent data gap with THEMIS GBO data revealed some areas where the download and processing scripts could be improved.
The primary change is that we need to switch from using wget to rsync, because the new Calgary server does not allow wget.
In the case of the download script, the actual wget calls complete relatively quickly; however, the script seems to spend a long time completing database updates, calling gbo_uc_rmd_mirror.php and updating a single file per query. We need to:
In the case of the processing script, completing the backlog took several days; this is fine on its own, but since it works by checking for files containing directory information in a single location, it interferes with routine processing for not just GBO sites, but all networks which we retrieve data via RMD files. Even if the processing script was run on a different machine, the directory files and the directory that the script checks would need to be different as well (to prevent the routine RMD processing script from trying to process the same files).
Additionally, processing a single month's worth of data seems to take a long time, but the exact reason for why this is is currently unknown. We need to: