-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathREADME.Rmd
More file actions
93 lines (57 loc) · 3.33 KB
/
README.Rmd
File metadata and controls
93 lines (57 loc) · 3.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
output:
md_document:
variant: markdown_github
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = FALSE,
comment = "#>",
fig.path = "README-"
)
```
# mineDisProt
The package mineDisProt was developed to extract data from unstructured or semi-structured formats and compile into matrices or data frames more amenable to analysis of intrinsically disorder (ID).
In proteins, intrinsic disorder (ID) is a phenomenon that describes the lack of a stable, or ordered, tertiary structure while still maintaining physiologic functions. Informatics tools, such as those provided by PONDR](http://www.pondr.com/) and [PONDR-FIT]( http://original.disprot.org/pondr-fit.php) make it easy to analyze a protein sequence for intrinsic disorder; however, this task can become tedious if one has tens or even hundreds of sequences to analyze.
My goal in developing `mineDisProt` was to simplify the data collection process so you can focus on the analysis of your data set.
## Installation
You can install the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("vanbibn/mineDisProt")
```
## Examples
First, we need to load the package.
```{r load}
library(mineDisProt)
```
### extract_pondr()
This function extracts numerical data from text porduced by the VLXT, VL3, and VSL2 disorder predictors from the Predictor of Natural Disordered Regions [PONDR](http://www.pondr.com/). For each protien sequence, the data from the Raw Output of all three predictors were pasted into a `.txt` file.
```{r example_1, warning=FALSE}
pondr_data <- extract_pondr("inst/extdata/pondr_text/")
# here we will view the data from the VLXT predictor
pondr_data[,1:6]
```
###### * You may get warnings about "incomplete final line found [in the text file]."
### extract_pondr.noVL3()
This is a version of `extract_pondr()`, except the raw data for the VL3 predictor is missing from the raw data text files.
```{r example_2, warning=FALSE}
extract_pondr.noVL3("inst/extdata/pondr_text_withoutVL3/")
```
### extract_pondrFIT()
This function extracts the relevant data from the temporaty URL produced by analyzing a protein sequence in the [PONDR-FIT](http://original.disprot.org/pondr-fit.php) protein disorder meta-predictor. It then calculates average and percent disorder scores from the per-residue scores. Before using this function, URLs should be collected and put in a `.csv` file with the UniProt ID in the first column and URL in the second.
```{r example_3, message=FALSE}
extract_pondrFIT("inst/extdata/pondrfit-url.csv")
```
###### * You may get messages about "Parsed with column specification:".
### extract_string()
Extract data on the quantity of protein interactions given by [STRING](https://string-db.org/) with the minimum number of interactions at medium (0.4), high (0.7), and highest (0.9) confidence.
```{r example_4, warning=FALSE}
string_data <- extract_string("inst/extdata/string/")
# here are the interaction data for each protein at the 0.4 confidence level
string_data[,1:6]
```
###### * You may get some warnings like "NAs introduced by coercion" if your protein is missing data at higher confidence levels.
***
Note: In future versions of this package, I hope to fully automate the data collection process.