Improvments to THEMIS GMAG RMD file download and processing

The recent data gap with THEMIS GBO data revealed some areas where the download and processing scripts could be improved.

The primary change is that we need to switch from using wget to rsync, because the new Calgary server does not allow wget.
- [x] Determine how to get the GBO RMD files retrieved using rsync while maintaining mirror structure
- [x] Implement changes into gbo_uc_rmd_mirror.ksh
- [x] Test functionality with gbo_uc_rmd_mirror_batch.ksh
- [x] Commit changes to svn
- [x] send to production

In the case of the download script, the actual wget calls complete relatively quickly; however, the script seems to spend a long time completing database updates, calling gbo_uc_rmd_mirror.php and updating a single file per query. We need to:
- [x] Determine if this database is still being used by any other scripts/if it's still needed.
    - [ ] If we still want to keep this database, we need to pass the queries to a single .sql file, and then have that sql file passed to single a mysql database call, minimizing repeated connections.
        - [ ] If we prefer, we can have a separate script/cronjob actually process the .sql file in a type of "mysql workdir", minimizing the number of scripts which require password access and enabling the database logging to function even if the original script encounters a fatal error.
    - [x] If not, we can remove the database and database logging calls.

In the case of the processing script, completing the backlog took several days; this is fine on its own, but since it works by checking for files containing directory information in a single location, it interferes with routine processing for not just GBO sites, but all networks which we retrieve data via RMD files. Even if the processing script was run on a different machine, the directory files and the directory that the script checks would need to be different as well (to prevent the routine RMD processing script from trying to process the same files).

Additionally, processing a single month's worth of data seems to take a long time, but the exact reason for why this is is currently unknown. We need to: 
- [x] Analyze and improve the performance of RMD processing for individual directories/files.
- [ ] Reconfigure the RMD scripts to allow routine processing while a reprocessing job is going on. 
- [ ] Enable the processing script to be run in parallel and allow script options of start and end dates to aid future reprocessing jobs. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvments to THEMIS GMAG RMD file download and processing #364

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improvments to THEMIS GMAG RMD file download and processing #364

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions