-
Notifications
You must be signed in to change notification settings - Fork 1
Moving and viewing data
Moving or copying files on a local machine or between remote servers is a skill well worth practicing.
Many programs for copying files use the following syntax:
<Copy-program> <source> <destination>
where <Copy-program> is the name of the program used for copying the data (e.g. cp, scp or rsync), <source> is the file that should be copied, and <destination> is where the data should be copied to.
This is the standard command for copying files from one part of a filesystem to another. It uses the same syntax as described above where the first option is the name of the file that should be copied (possibly including the full or relative path) and the second option is the location to copy the file to.
[mtop@albiorix files] cp /etc/hostname .In the example above, the source is the file /etc/hostname and the destination is your current working directory, here represented by the shortcut ..
Downloading data from the web can be done using several different programs. The most common method is to direct a regular web browser to a URL - from there, the download either starts automatically, or you have to save the content displayed manually. This can sometimes be a complicated procedure and a simpler method is to use the wget program, which will download the file to your working directory in a single step. Here is an example:
[mtop@albiorix files]$ wget https://github.com/mtop/speciesgeocoder/archive/1.1.0.tar.gzYou can view the file size of the downloaded file with the following command:
[mtop@albiorix files]$ ls -lh 1.1.0.tar.gz
-rw-r--r--. 1 1000740000 1000740000 16M Dec 11 14:01 1.1.0.tar.gzThe file you have just downloaded has the suffix tar.gz which indicates that it is a tar-file (basically several files bundled together in a singel file) that is compressed with gzip. To uncompress and extract the files from this tar-file, use the following command:
[mtop@albiorix files]$ tar -zxvf 1.1.0.tar.gz
speciesgeocoder-1.1.0/
speciesgeocoder-1.1.0/.gitignore
speciesgeocoder-1.1.0/LICENSE
speciesgeocoder-1.1.0/NEWS
speciesgeocoder-1.1.0/R/
speciesgeocoder-1.1.0/R/.Rhistory
speciesgeocoder-1.1.0/R/1birdtree.nex
speciesgeocoder-1.1.0/R/CAHighlands_SA.txt
...As you can see, the tar-file contained several files stored in a singe directory (speciesgeocoder-1.1.0). To again compress this directory to a singe tar-file, use the following command.
[mtop@albiorix files]$ tar -zcvf my_tar-file.tgz speciesgeocoder-1.1.0/
speciesgeocoder-1.1.0/
speciesgeocoder-1.1.0/.gitignore
speciesgeocoder-1.1.0/LICENSE
speciesgeocoder-1.1.0/NEWS
speciesgeocoder-1.1.0/R/
speciesgeocoder-1.1.0/R/.Rhistory
speciesgeocoder-1.1.0/R/1birdtree.nex
speciesgeocoder-1.1.0/R/CAHighlands_SA.txt
...Note the differences compared to the first tar command you used (you can read the manual pages for tar here. The first option is the name of the new tar-file you are creating (my_tar-file.tgz) and the second option is the file(s) you want to include. Also note that the file extention .tgz is sometimes used instead of .tar.gz.
As you can see, the directory speciesgeocoder-1.1.0 is still there after the compression, alongside a compressed copy called my_tar-file.tgz.
We can now explore the content of speciesgeocoder-1.1.0 and for example look at the number of lines in the file speciesgeocoder-1.1.0/README.md...
[mtop@albiorix files]$ wc -l speciesgeocoder-1.1.0/geocoder.py
699 speciesgeocoder-1.1.0/geocoder.py... which is 699. To extract the first 250 lines we can use this command:
[mtop@albiorix files]$ head -250 speciesgeocoder-1.1.0/geocoder.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Species locality data + polygons -> nexus file
#
# Copyright (C) 2014 Mats Töpel. mats.topel@bioenv.gu.se
#
# Citation: If you use this version of the program, please cite;
# Mats Töpel (2014) Open Laboratory Notebook. www.matstopel.se
...- Create a new directory called
unix_exercise(note that it is bad practice to use "whitespace" in file- or directory names, instead, use the "_" character). - Use the command
headand extract the first 1000 lines from the filespeciesgeocoder-1.1.0/example_data/gbif_Ivesia_localities.txt. Redirect the output fromheadto a new file called1000k_ivesia.txtinunix_exercise. - Inside the
unix_exercisedirectory, create a text file calledREADME.txt(make sure you name it exactly like this with all capitalised letters) using the commandtouch. - Open the file
README.txtin JupyterHub by double-clicking on it. Add the following information:
- Your name
- Date
- Number of lines in the file
speciesgeocoder-1.1.0/example_data/localities.csv - Crate a compressed tar file of
unix_exerciseand upload it to Canvas.