Title: | Frequency Matrix Approach for Assessing Very Low Frequency Variants in Sequence Records |
---|---|
Description: | Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures. Based on Stoeckle and Kerr (2012) <doi:10.1371/journal.pone.0043992>. |
Authors: | Taryn B. T. Athey [aut, cre], Paul D. McNicholas [aut, cre], Jarrett D. Phillips [ctb] |
Maintainer: | Taryn B. T. Athey <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1 |
Built: | 2025-03-18 05:03:11 UTC |
Source: | https://github.com/cran/VLF |
Using frequency matrices, very low frequency variants (VLFs) are assessed for amino acid and nucleotide sequences. The VLFs are then compared to see if they occur in only one member of a species, singleton VLFs, or if they occur in multiple members of a species, shared VLFs. The amino acid and nucleotide VLFs are then compared to see if they are concordant with one another. Amino acid VLFs are also assessed to determine if they lead to a change in amino acid residue type, and potential changes to protein structures.
Package: | VLF |
Type: | Package |
Version: | 1.0 |
Date: | 2013-10-25 |
License: | GPL (>=3) |
vlfFun() aminoAcidFun() concordanceFun()
Taryn B. T. Athey and Paul D. McNicholas
Maintainer: Taryn B. T. Athey <[email protected]>
## Not run: #VLF analysis data(birds) bird_vlfAnalysis <- vlfFun(birds) #Amino Acid analysis data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) #Concordance analysis nuc_matrix <- bird_vlfAnalysis$VLFmatrix aa_matrix <- bird_aaAnalysis$VLFmatrix aa_modal <- bird_aaAnalysis$modal bird_Concordance <- concordanceFun(nuc_matrix, aa_matrix, 648, 216, aa_modal) ## End(Not run)
## Not run: #VLF analysis data(birds) bird_vlfAnalysis <- vlfFun(birds) #Amino Acid analysis data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) #Concordance analysis nuc_matrix <- bird_vlfAnalysis$VLFmatrix aa_matrix <- bird_aaAnalysis$VLFmatrix aa_modal <- bird_aaAnalysis$modal bird_Concordance <- concordanceFun(nuc_matrix, aa_matrix, 648, 216, aa_modal) ## End(Not run)
Compares amino acid very low frequency variants between specimen of the same species
aa.compare(x, seqlength)
aa.compare(x, seqlength)
x |
A list of amino acid sequences separated by species name. |
seqlength |
The length of the amino acid sequences within the list. |
The argument x can be calculated using the separate function.
A matrix containing two vectors, one with singleton VLF counts for each position of the sequence, and one with shared VLF counts for each position of the sequence.
Taryn B.T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) ## End(Not run)
Calculates the conservation of the first amino acids that occur most often in a matrix of amino acid sequences.
aa.conservation_first(modal, p, seqlength)
aa.conservation_first(modal, p, seqlength)
modal |
A vector of the frequencies for the amino acids in the first modal sequence. |
p |
A conservation value for the amino acid frequencies to be compared to |
seqlength |
Length of the amino acid sequence |
The item modal can be calculated using the aa.MODE.freq function.
A vector that contains how many amino acids from the first modal sequence are conserved at the specified conservation level.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_firstConservation_100 <- aa.conservation_first(aminoAcid_firstModalFreq, 1, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_firstConservation_100 <- aa.conservation_first(aminoAcid_firstModalFreq, 1, 216) ## End(Not run)
Calculates the conservation of the amino acids that occur first and second most often in a matrix of sequences
aa.conservation_two(modal1, modal2, p, seqlength)
aa.conservation_two(modal1, modal2, p, seqlength)
modal1 |
A vector of the frequencies for the amino acids in the first modal sequence. |
modal2 |
A vector of the frequencies for the amino acids in the second modal sequence |
p |
A conservation value for the amino acid frequencies to be compared to. |
seqlength |
The length of the amino acid sequence. |
The argument modal1 can be calculated using the aa.MODE.freq function, and the argument modal2 can be calculated using he aa.MODE.second.freq function.
A vector that contains how many amino acids from the first and second modal sequences are conserved at the specified conservation level.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_secondModalFreq <- aa.MODE.second.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_secondConservation_99.9 <- aa.conservation_two(aminoAcid_firstModalFreq, aminoAcid_secondModalFreq, 0.999, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_secondModalFreq <- aa.MODE.second.freq(aminoAcid_frequency.Matrix, 216) aminoAcid_secondConservation_99.9 <- aa.conservation_two(aminoAcid_firstModalFreq, aminoAcid_secondModalFreq, 0.999, 216) ## End(Not run)
Counts the number of each amino acid in each positino of the barcode.
aa.count.function(aminoAcids, seqlength)
aa.count.function(aminoAcids, seqlength)
aminoAcids |
A matrix of barcode amino acid sequences. |
seqlength |
Length of the amino acid sequences. |
The first and second column of the aminoAcid argument must contain the unique specimen identifier and the species name, respectively, followed by the amino acid sequence.
A matrix containing the number of each amino acid in each position of the sequence. Each row is a different amino acid count, while the columns represent the sequence position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) ## End(Not run)
Determines the number of shared and singleton amino acid VLFs.
aa.find.singles(aaSpecies, seqlength)
aa.find.singles(aaSpecies, seqlength)
aaSpecies |
List of amino acid sequences separated by species name. |
seqlength |
Length of amino acid sequences. |
The argument aaSpecies contains only amino acid VLFs, and NAs in any other position in the sequence. The list can be created using the separate function.
A matrix containing the number of singleton and shared aaVLFs in each position of the barcode.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) ## End(Not run)
Calculates the frequency of each amino acid.
aa.frequency.matrix.function(aa.count, seqlength)
aa.frequency.matrix.function(aa.count, seqlength)
aa.count |
A matrix containing the number of each amino acid in each position. |
seqlength |
The length of the amino acid sequence |
The aa.count argument can be calculated using the function aa.count.function
A matrix of the frequencies for each amino acid in each position of the barcode sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) ## End(Not run)
Calculates the amino acid sequence that occurs most often in a matrix of amino acid sequences
aa.MODE(freq.matrix, seqlength)
aa.MODE(freq.matrix, seqlength)
freq.matrix |
Freuqncy matrix for amino acids. |
seqlength |
Length of amino acid sequences. |
The argument freq.matrix can be calculated using the function aa.frequency.matrix.function
A vector containing the amino acid sequence that occurs most often.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) ## End(Not run)
Returns the frequencies of the amino acids that occur most often in each position of the sequence.
aa.MODE.freq(freq.matrix, seqlength)
aa.MODE.freq(freq.matrix, seqlength)
freq.matrix |
Frequency matrix for amino acids. |
seqlength |
Length of the amino acid sequences |
The argument freq.matrix can be calculated using the function aa.frequency.matrix.function
A vector of frequencies for the first modal sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_firstModalFreq <- aa.MODE.freq(aminoAcid_frequency.Matrix, 216) ## End(Not run)
Returns the frequencies of the amino acids that occur second most often in each position of a matrix of amino acid sequences.
aa.MODE.second.freq(freq.matrix, seqlength)
aa.MODE.second.freq(freq.matrix, seqlength)
freq.matrix |
Frequency matrix for amino acids. |
seqlength |
Length of amino acid sequences. |
The argument freq.matrix can be calculated using the function aa.frequency.matrix.function
A vector containing the frequencies of the second modal amino acid sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_secondModalFreq <- aa.MODE.second.freq(aminoAcid_frequency.Matrix, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) aminoAcid_secondModalFreq <- aa.MODE.second.freq(aminoAcid_frequency.Matrix, 216) ## End(Not run)
Converts a matrix of amino acid sequences into a matrix of amino acid frequencies.
aa.specimen.frequencies(freq, seq.matrix, spec.names, seqlength)
aa.specimen.frequencies(freq, seq.matrix, spec.names, seqlength)
freq |
Frequency matrix for amino acids. |
seq.matrix |
Matrix of specimen amino acid sequences. |
spec.names |
A vector of the species names for each specimen in aminoAcids in the ordfer they appear in the matrix. |
seqlength |
Length of amino acid sequences. |
The argument freq can be calculated using the function aa.frequency.matrix.function.
A matrix containing the frequencies of each amino acid in the sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids,216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count,216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids,216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count,216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) ## End(Not run)
Converts a matrix of amino acid frequencies for each specimen into a matrix consisting of only VLF values and NAs in every non-VLF position.
aa.VLF.convert.matrix(seq.matrix, freq, p, seqlength)
aa.VLF.convert.matrix(seq.matrix, freq, p, seqlength)
seq.matrix |
A matrix of aligned DNA barcode amino acid sequences. |
freq |
A matrix of amino acid frequencies for each specimen. |
p |
A very low frequency variant cut-off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
The length of the amino acid sequences. |
A matrix of VLF amino acid frequencies, containing only those nucleotide frequencies that occur less than the designation p value, and NAs in each other position of the matrix.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
Calculates the number of very low frequency variants in each position in a matrix of sequences
aa.VLF.count.pos(freq, p, seqlength)
aa.VLF.count.pos(freq, p, seqlength)
freq |
A matrix of frequencies for each speicmen. |
p |
A very low frequency variant cut off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
The length of the amino acid sequences. |
A vector containing the amino acid VLF count for each position of the sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_positionVLFcount <- aa.VLF.count.pos(bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_positionVLFcount <- aa.VLF.count.pos(bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
Calculates the number of very low frequency variants for each specimen in a matrix of sequences.
aa.VLF.count.spec(freq, p, seqlength)
aa.VLF.count.spec(freq, p, seqlength)
freq |
A matrix of amino acid frequencies for each specimen. |
p |
A very low frequency variant cut-off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
The length of the amino acid sequences. |
A vector containing the aaVLF count for every specimen.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) ## End(Not run)
Reduces a matrix of amino acid very low frequency variants (aaVLFs) so that only those specimen that contain aaVLFs remain
aa.VLF.reduced(NA.matrix, sCount, seqlength)
aa.VLF.reduced(NA.matrix, sCount, seqlength)
NA.matrix |
A matrix with values for amino acid very low frequency variants for each specimen and NAs in all other positions. |
sCount |
A vector for amino acid very low frequency variant (VLF) counts for each specimen in the NA.matrix. |
seqlength |
Length of the amino acid sequences. |
The argument NA.matrix can be calculated using aa.VLF.convert.matrix and VLF.aminoAcids, the sCount argument can be calculaed using the aa.VLF.count.spec function.
A matrix containing only specimen with aaVLFs, and only the aaVLF values in the sequences. All other positions of the sequence contain NAs.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) ## End(Not run)
Determines how many aaVLFs have changed “type” of amino acid from the modal amino acid sequence. Amino acid types are polar charged, polar uncharged, non-polar, and those with a unique side group.
aaVLFs.to.modalchanges(modal, AminoAcidList, aalength)
aaVLFs.to.modalchanges(modal, AminoAcidList, aalength)
modal |
The modal amino acid sequence (i.e., the amino acid sequence that occurs most often based on the amino acid frequency matrix) |
AminoAcidList |
Matrix of VLF amino acid sequences containing only aaVLFs and NAs anywhere else |
aalength |
Amino Acid sequence length. |
The argument modal can be created using the MODE function. The argument AminoAcidList can be created using the aa.VLF.convert.matrix, VLF.aminoAcids, and aa.VLF.reduced functions.
A sameAll value representative of the number of amino acids that were the same type as the modal, a changedAll value representative of the number of amino acids that changed amino acid type from the modal.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) All_aaType_change <- aaVLFs.to.modalchanges(aminoAcid_Modal, birds_aaVLFreduced, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) All_aaType_change <- aaVLFs.to.modalchanges(aminoAcid_Modal, birds_aaVLFreduced, 216) ## End(Not run)
Gives the position and residue of the amino acid VLFs in a matrix containing amino acid VLFs and NAs.
aminoAcid.matching.positions(matchAA, aalength)
aminoAcid.matching.positions(matchAA, aalength)
matchAA |
A matrix containing aaVLFs and NAs in all other positions of the sequences. |
aalength |
Amino acid sequence length. |
The argument matchAA can be calculated usingthe find.matching function and taking the first argument from the returned value.
A list for each aaVLF with a matching specimen identifier to a ntVLF. The first position in each entry of the list contains the specimen identifier, the second position contains the species name, the third position contains the sequence position of the aaVLF, and the fourth position contains the aaVLF.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) ## End(Not run)
Runs the full amino acid VLF analysis for the user and outputs results.
aminoAcidFun(x, p = 0.001, seqlength = 216, own = NULL)
aminoAcidFun(x, p = 0.001, seqlength = 216, own = NULL)
x |
A matrix of amino acid sequences with the first column containing the unique specimen identifier, the second column containing the species name and the remaining columns containing the amino acid sequence. |
p |
A VLF designation frequency cut-off to be used within the analysis. By default p = 0.001. |
seqlength |
The length of the amino acid sequence. By default seqlength = 216. |
own |
If the user wants to compare their own sequences separate from reference sequences, then argument can be used. Similar to x, this argument is a matrix of amino acid sequences with the first column containing the unique specimen identifier, the second column contains the species name and the remaining columns containing the nucleotide sequence. By default own = NULL.) |
modal |
A vector containing the amino acid sequence that occurs most often in the dataset. |
con100 |
The number of amino acid positions that are 100% conserved in the sequence |
conp |
The number of amino acid positions that are (1-p)% conserved in the sequence |
combine |
The number of amino acid positions that are (1-p)% conserved when combining the first and second modal sequences. |
specimen |
A vector containing the number of VLFs for each specimen in the dataset. |
position |
A vector containing the number of VLFs for each position in the sequences. |
sas |
A matrix containing vectors of single and shared amino acid VLF counts for each position of the sequence. |
VLFmatrix |
A matrix containing only those specimen that have VLFs as well as the amino acid at the positions that contain VLFs and NAs in all other positions. |
ownSpecCount |
A vector containing the number of VLFs for each specimen in the users own specified dataset. Only appears if own is not NULL. |
ownPosCount |
A vector containing the number of VLFs for each position in the sequences of the users own specified dataset. Only appears if own is not NULL. |
ownVLFMatrix |
A matrix containing only those amino acids at the positions that contain VLFs and NAs in all other positions of the sequence. Only appears if own is not NULL. |
ownVLFreduced |
ownVLFMatrix |
A matrix containing only those specimen that have VLFs as well as the amino acids at the positions that contain VLFs and NAs in all other positions of the sequence. Only appears if own is not NULL.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) ## End(Not run)
## Not run: data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) ## End(Not run)
Data set containing nucleotide sequences for 11,333 bird barcodes.
data(birds)
data(birds)
The format is: chr [1:11333, 1:650] "gi|359280039|gb|JQ173884.1|" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:650] "V1" "V2" "V3" "V4" ...
Stoeckle, M. Y. and Kerr, K. C. R. (2012) Frequency Matrix Approach Demonstrates High Sequence Quality in Avian BARCODEs and Highlights Cryptic Pseudogenes. PLoS ONE. 7, e43992.
## Not run: data(birds)
## Not run: data(birds)
Data set containing amino acid sequences for 11,333 bird barcodes.
data(birds_aminoAcids)
data(birds_aminoAcids)
The format is: chr [1:11333, 1:218] "gi|359280039|gb|JQ173884.1|" ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:218] "V1" "V2" "V3" "V4" ...
Stoeckle, M. Y. and Kerr, K. C. R. (2012) Frequency Matrix Approach Demonstrates High Sequence Quality in Avian BARCODEs and Highlights Cryptic Pseudogenes. PLoS ONE. 7, e43992.
## Not run: data(birds_aminoAcids)
## Not run: data(birds_aminoAcids)
Compares VLFs between specimen of the same species.
compare(x, seqlength)
compare(x, seqlength)
x |
A list of sequences separated by species name. Each entry in the list contains a matrix of sequences from the same species. |
seqlength |
Length of the sequences. |
List of sequences by species names, x, can be created using the separate function
A matrix containing two vectors, one with singleton VLF counts for each position of the sequence, and one with shared VLF counts for each position of the sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) #The compare function is called on from within the find.singles function birds_singleAndShared <- find.singles(bird_species, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) #The compare function is called on from within the find.singles function birds_singleAndShared <- find.singles(bird_species, 648) ## End(Not run)
Compares ntVLFs to aaVLFs to see if they are concordant (i.e., if the ntVLF causes the aaVLF).
concordanceFun(nuc, aa, nuclength = 648, aalength = 216, aminoAcid_Modal)
concordanceFun(nuc, aa, nuclength = 648, aalength = 216, aminoAcid_Modal)
nuc |
A matrix of ntVLFs, that contains only those specimen with VLFs, and a sequence with only VLF nucleotides and NAs in all other positions of the nucleotide sequences. |
aa |
A matrix of aaVLFs, that contains only those specimen with VLFs, and a sequence with only VLF amino acids and NAs in all other positions of the amino acid sequence. |
nuclength |
The length of the nucleotide sequence. By default is 648. |
aalength |
The length of the amino acid sequence. By default is 216. |
aminoAcid_Modal |
The modal amino acid sequence (i.e., the amino acid sequence that occurs most often in the given sequences) |
The argument nuc can be taken from the VLFmatrix output from the vlfFun function. The argument aa can be taken from the VLFmatrix output from the aminoAcidFun function. The argument aminoAcid_Modal can be taken from the modal output from the aminoAcidFun function.
matched |
A list of the concordant ntVLFs and aaVLFs. Contains the specimen identifier, the species name, the concordant amino acid, the amino acid position, and the concordant amino acid position. There may be mulitple entries for the same aaVLF if that VLF is concordant to more than one ntVLF. |
codons |
A vector containing calculations for how many of the concordant amino acids were caused by changes in each of the nucleotide codon positions. |
concordantType |
Contains information on how many of the concordant aaVLFs had a change in amino acid residue type and how many remained in the same amino acid residue category. |
aminoAcidType |
Contains information on how many of the aaVLFs had a change in amino acid residue type and how many remained in the same amino acid residue category. |
concordNuc |
Gives the number of ntVLFs that showed concordance to aaVLFs. |
concordAA |
Gives the number of aaVLFs that showed concordance to ntVLFs. |
sequences |
Gives the number of sequences that had both ntVLFs and aaVLFs. |
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #VLF analysis data(birds) bird_vlfAnalysis <- vlfFun(birds) #Amino Acid analysis data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) #Concordance analysis bird_Concordance <- concordanceFun(bird_vlfAnalysis$VLFmatrix, bird_aaAnalysis$VLFmatrix, 648, 216, bird_aaAnalysis$modal) ## End(Not run)
## Not run: #VLF analysis data(birds) bird_vlfAnalysis <- vlfFun(birds) #Amino Acid analysis data(birds_aminoAcids) bird_aaAnalysis <- aminoAcidFun(birds_aminoAcids) #Concordance analysis bird_Concordance <- concordanceFun(bird_vlfAnalysis$VLFmatrix, bird_aaAnalysis$VLFmatrix, 648, 216, bird_aaAnalysis$modal) ## End(Not run)
Deternubes how many concordant aaVLFs have changed type of amino acid from the modal amino acid sequence. Amino acid residue types are polar charged, polar uncharged, non-polar, and amino acids with a unique side group.
concordant.to.modalchanges(matched, modal)
concordant.to.modalchanges(matched, modal)
matched |
A list containing the concordant aaVLFs and their properties (e.g., sequence position). |
modal |
A vector containing the modal amino acid sequence. |
The matched argument can be calculated using the overall.matched function. The modal argument can be calculated using the aa.MODE function.
A vector containing the number of concordant aaVLFs that changed amino acid residue type, and the number that contained the same residue type.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) concordant_aaType_change <- concordant.to.modalchanges(matching_comparison, aminoAcid_Modal) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) concordant_aaType_change <- concordant.to.modalchanges(matching_comparison, aminoAcid_Modal) ## End(Not run)
Calculates the conservation of the nucleotides that occur most often in a matrix of sequences/
conservation_first(modal, p, seqlength)
conservation_first(modal, p, seqlength)
modal |
A vector of the frequencies of the nucleotides in the first modal sequences. |
p |
A conservation value for the nucleotide frequencies to be compared to. |
seqlength |
The length of the nucleotide sequence. |
The argument modal can be calculated using the MODE.freq function.
A vector that contains how many nucleotides from the first modal sequence are conserved at the specified conservation level for each codon position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) First_conserved_100 <- conservation_first(Bird_first.modal.frequencies, 1, 648) First_conserved_99.9 <- conservation_first(Bird_first.modal.frequencies, 0.999, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) First_conserved_100 <- conservation_first(Bird_first.modal.frequencies, 1, 648) First_conserved_99.9 <- conservation_first(Bird_first.modal.frequencies, 0.999, 648) ## End(Not run)
Calculates the conservation of the nucleotides that occur first and second most often in a matrix of sequences.
conservation_two(modal1, modal2, p, seqlength)
conservation_two(modal1, modal2, p, seqlength)
modal1 |
A vector of the frequencies for the nucleotides in the first modal sequence. |
modal2 |
A vector of the frequencies for the nucleotides that occur second most often. |
p |
A conservation value for the nucleotide frequencies to be compared to. |
seqlength |
The nucleotide sequence length. |
The argument modal1 can be calculated using the function MODE.freq. The argument modal2 can be calculated using the function MODE.second.freq.
A vector that contains how many nucleotides from the first and second modal sequences are conserved at the specified conservation level for each codon position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) Bird_second.modal.frequencies <- MODE.second.freq(frequency.matrix, 648) FirstAndSecond_conserved_99.9 <- conservation_two(Bird_first.modal.frequencies, Bird_second.modal.frequencies, 0.999, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) Bird_second.modal.frequencies <- MODE.second.freq(frequency.matrix, 648) FirstAndSecond_conserved_99.9 <- conservation_two(Bird_first.modal.frequencies, Bird_second.modal.frequencies, 0.999, 648) ## End(Not run)
Counts the number of each dNTP in each position of an aligned barcode matrix.
count.function(nucleotides, spec.no, seqlength)
count.function(nucleotides, spec.no, seqlength)
nucleotides |
A matrix of aligned DNA barcode sequences. DNA sequences should start at the third column of the matrix, while the first column contains a unique specimen identifier and the second column contains the species name. |
spec.no |
The number of specimen/sequences in the nucldeotide matrix. |
seqlength |
The length of the nucleotide sequences. |
A matrix containing the number of each nucleotide in each position of the sequence. Each row is a different dNTP count, while the columns represent the sequence position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) Nuc.count <- count.function(birds, specimen.Number, 648) ## End(Not run)
## Not run: data(birds) Nuc.count <- count.function(birds, specimen.Number, 648) ## End(Not run)
Creates a plot of VLF distributions summed for each decile segment.
Decile.Plot(VLF, seqlength)
Decile.Plot(VLF, seqlength)
VLF |
A list of VLFs in each barcode position. May be a matrix containing vectors of singleton and shared VLFs, or can be a single vector of total VLFs. |
seqlength |
The length of the sequence. Usually 648 for nucleotide sequences and 216 for amino acid sequences. |
A barplot containing the sum of the VLFs for each decile barcode segment.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) Decile.Plot(Bird_position_VLFcount, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) Decile.Plot(Bird_position_VLFcount, 648) ## End(Not run)
Calculates shared, singleton, and total second codon position error rate in a matrix of sequences.
Error.Rate(single, shared, spec, seqlength)
Error.Rate(single, shared, spec, seqlength)
single |
A vector of singleton very low frequency variant (VLF) counts for each position in the sequence. |
shared |
A vector of shared very low frequenct variant (VLF) counts for each position in the sequence. |
spec |
The number of specimen being considered in the dataset. |
seqlength |
The length of the barcode sequence. |
The arguments single and shared can be calculated simultaneously using the find.singles function. The spec argument can be calculated by using the nrow() function on the sequence matrix.
A vector containing the single, shared, and total error rate based on the number of second position VLFs.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) Bird_error <- Error.Rate(birds_singleAndShared[1,], birds_singleAndShared[2,], specimen.Number, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) Bird_error <- Error.Rate(birds_singleAndShared[1,], birds_singleAndShared[2,], specimen.Number, 648) ## End(Not run)
Reads in fasta files and converts into sequence matrix
fasta.read(file, seqlength = 648, pos1 = 1, pos2 = 3)
fasta.read(file, seqlength = 648, pos1 = 1, pos2 = 3)
file |
A fasta file to be read in. |
seqlength |
Length of sequence. |
pos1 |
The position within the fasta title of the unique specimen identifier. By default pos1 = 1. |
pos2 |
The position within the fasta title of the species name. By default pos2 = 3. |
A matrix of sequences, with the unique specimen identifers in the first column, the species names in the second column, and the sequence starting in the third column.
Taryn B. T. Athey and Paul D. McNicholas
Calculates the frequency of each dNTP in each position of a nucleotide count matrix.
ffrequency.matrix.function(count.matrix, seqlength)
ffrequency.matrix.function(count.matrix, seqlength)
count.matrix |
A matrix of the counts for each dNTP from a matrix of aligned sequences. |
seqlength |
Length of sequences. |
The argument count.matrix can be calculated using the function count.function.
A matrix of the frequencies for each dNTP in each position of the barcode sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) ## End(Not run)
Compares a list of aaVLF and ntVLF matrices for common specimen identifiers.
find.matching(NucleotideList, AminoAcidList, nuclength, aalength)
find.matching(NucleotideList, AminoAcidList, nuclength, aalength)
NucleotideList |
Matrix of VLF nucleotide sequences containing the only the nucleotidies that are VLFs and NAs in the other positions of the sequences. |
AminoAcidList |
Matrix of VLF amino acid sequences containing only the aaVLFs and NAs in the other positions of the sequences. |
nuclength |
Length of the nucleotide sequence (should be 3X the length of the amino acid sequence). |
aalength |
Length of the amino acid sequence (should be 1/3 the length of the nucleotide sequence). |
The argument NucleotideList can be calculated using the VLF.convert.matrix, VLF.nucleotides, and VLF.reduced functions. The argument AminoAcidList can be calculated using the aa.VLF.convert.matrix, VLF.aminoAcids, and aa.VLF.reduced functions.
A list containing matrices of aaVLFs in the first position and ntVLFs in the second position who have matching specimen identifiers.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) ## End(Not run)
Calculates the number of singleton and shared VLFs for each position of the nucleotide, by first seeing if there is only one specimen for a species, and then calling on the compare() function to calculate the number of singleton and shared VLFs for those species with multiple specimen.
find.singles(species, seqlength)
find.singles(species, seqlength)
species |
A list of sequences separated byh species name. Each entry in the list contains a matrix of sequences from the same species. |
seqlength |
Length of the nucleotide sequence. |
The argument species can be calculated using the separate function.
A matrix containing the number of singleton and shared ntVLFs in each position of the barcode.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) ## End(Not run)
Counts which codon positions of ntVLFs lead to the concordant aaVLF.
matched.codon.position(matched)
matched.codon.position(matched)
matched |
A list of the nucleotide position of concordant ntVLFs and their associated aaVLFs. |
The argument matched can be calculated using the function overall.matched.
A vector containing the number of concordant VLFs caused by ntVLFs in each codon position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) matching_codons <- matched.codon.position(matching_comparison) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) matching_codons <- matched.codon.position(matching_comparison) ## End(Not run)
Calculates the nucleotide sequence that occurs most often in a matrix of sequences.
MODE(freq, seqlength)
MODE(freq, seqlength)
freq |
Frequenct matrix for nucleotides. |
seqlength |
Length of nucleotide sequence. |
The argument freq can be calculated using the function ffrequency.matrix.function.
A vector containing the first modal sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) ## End(Not run)
Returns the frequencies of the nucleotides in each position of nucleotide sequence that occurs most often.
MODE.freq(freq, seqlength)
MODE.freq(freq, seqlength)
freq |
Frequency matrix for nucleotides. |
seqlength |
Length of the nucleotide sequence. |
The argument freq can be calculated using the function ffrequency.matrix.function.
A vector of frequencies for the first modal sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) ## End(Not run)
Calculates the frequencies of the nucleotides that occur second most often in a matrix of sequences.
MODE.second.freq(freq, seqlength)
MODE.second.freq(freq, seqlength)
freq |
Frequency matrix for nucleotides. |
seqlength |
Length of nucleotide sequences. |
The argument freq can be calculated using the function ffrequency.matrix.function.
A vector containing the frequencies of the nucleotide sequence that occurs second most often.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) Bird_second.modal.frequencies <- MODE.second.freq(frequency.matrix, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) nucleotide.modalSequence <- MODE(frequency.matrix, 648) Bird_first.modal.frequencies <- MODE.freq(frequency.matrix, 648) Bird_second.modal.frequencies <- MODE.second.freq(frequency.matrix, 648) ## End(Not run)
Calculates the position of the VLFs in a matrix contain ntVLFs whose specimen identifiers match identifiers of a matrix containing aaVLFs.
nucleotide.matching.positions(matchNuc, nuclength)
nucleotide.matching.positions(matchNuc, nuclength)
matchNuc |
A matrix containing only of the nucleotides that are VLFs and NAs in all other positions of the sequences. |
nuclength |
The length of the nucleotide sequence. |
The argument matchNuc can be calculated using the function find.matching.
A list for each ntVLF containing the specimen identifier in the first position of each list entry, the species name in the second position of each list entry, and the position of the ntVLF in the third position of each list entry.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) ## End(Not run)
Compares the ntVLFs and aaVLFs with the same specimen identifier, and determines which ntVLFs are concordant with aaVLFs.
overall.matched(positionNuc, positionAA, nuclength, aalength)
overall.matched(positionNuc, positionAA, nuclength, aalength)
positionNuc |
A list containing the names of the specimen and the ntVLF positions for specimens that have both aaVLFs and ntVLFs. |
positionAA |
A list containing the names of the specimen, the aaVLF, and the position of the aaVLF for specimens that have both aaVLFs and ntVLFs. |
nuclength |
The length of the nucleotide sequence (should by 3X the length of the amino acid sequence) |
aalength |
The length of the amino acid sequence (should be 1/3 the length of the nucleotide sequence) |
The argument positionNuc can be calculated using the function nucleotide.matching.positions. The argument positionAA can be calculated using the function aminoAcid.matching.positions.
A list of each ntVLF containing the specimen identifier in the first position of each list entry, the species name in the second position of each list entry, the aaVLF in the third position of each entry, the amino acid position of the aaVLF in the fourth entry, and the codon position of the concordant ntVLF in each position of the entry. If multiple ntVLFs have concordance with one aaVLF, then that aaVLF may contain multiple entries in the list, one for each ntVLF.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aa_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aa_freq.Mat <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aa_freq <- aa.specimen.frequencies(aa_freq.Mat, birds_aminoAcids, birds_aa_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aa_freq.Mat, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aa_freq, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aa_freq, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) ## End(Not run)
## Not run: #Nucleotide VLF analysis data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) birds_singleAndShared <- find.singles(bird_species, 648) #Amino Acid VLF Analysis data(birds_aminoAcids) birds_aa_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aa_freq.Mat <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aa_freq <- aa.specimen.frequencies(aa_freq.Mat, birds_aminoAcids, birds_aa_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aa_freq.Mat, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aa_freq, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aa_freq, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) birds_aaVLFreduced <- aa.VLF.reduced(birds_aminoAcidVLFs, birds_aminoAcid_specimenVLFcount, 216) birds_aaSpecies <- separate(birds_aaVLFreduced) birds_aminoAcid_singleAndShared <- aa.find.singles(birds_aaSpecies, 216) #Concordance Analysis VLF_match <- find.matching(bird_VLFreduced, birds_aaVLFreduced, 648, 216) position_matchingNuc <- nucleotide.matching.positions(VLF_match[[2]], 648) position_matchingAA <- aminoAcid.matching.positions(VLF_match[[1]], 216) matching_comparison <- overall.matched(position_matchingNuc, position_matchingAA, 648, 216) ## End(Not run)
Separates specimen into lists by species names.
separate(x)
separate(x)
x |
A matrix of sequences, usually reduced sequences containing only VLFs, where the second position of the matrix contains the species name for the specimen. |
If the argument x needs to be a reduced matrix, it can be calculated using the function VLF.reduced.
A list containing a matrix of sequences for each species.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) bird_species <- separate(bird_VLFreduced) ## End(Not run)
Creates a sliding window analysis plot for the VLFs in a matrix of sequences.
Sliding.Window(VLF, seqlength, n = 30)
Sliding.Window(VLF, seqlength, n = 30)
VLF |
A vector of VLFs per position across the barcocde. Can be a single vector of all VLFs per positions, or can be a matrix containing singleton and shared VLFs. |
seqlength |
Length of the barcode sequence. |
n |
The number of positions to average the window across (n = 30 by default). |
The argument VLF can be calculated using the function VLF.count.pos for all VLFs, or find.singles for singleton and shared VLFs.
A sliding window plot for the VLFs in each position of the barcode averaged over n.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) Sliding.Window(Bird_position_VLFcount, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) Sliding.Window(Bird_position_VLFcount, 648) ## End(Not run)
Converts a matrix of sequences into a matrix of nucleotide frequencies.
specimen.frequencies(freq, seq.matrix, no.spec, spec.names, seqlength)
specimen.frequencies(freq, seq.matrix, no.spec, spec.names, seqlength)
freq |
Frequency matrix for nucleotides. |
seq.matrix |
Matrix of specimen sequences, where the sequence starts in the third position of the matrix and the first and second position contain the unique specimen identifier and the species name, respectively. |
no.spec |
The number of specimen in seq.matrix. |
spec.names |
A vector containing the names of the specimen in the seq.matrix, in the order they appear in the matrix. |
seqlength |
The length of the nucleotide sequence. |
The argument freq can be calculated using the function ffrequency.matrix.function. The number of specimen can be calculated by using the nrow() function on seq.matrix.
A matrix containing the unique specimen identifer in the first position, the species name in the second position, and the frequencies for each nucleotide in the sequences starting at the third position.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) ## End(Not run)
Converts a matrix of amino acid frequencies into a matrix of amino acids.
VLF.aminoAcids(convert.matrix, seq.matrix, seqlength)
VLF.aminoAcids(convert.matrix, seq.matrix, seqlength)
convert.matrix |
A matrix consisting of only aaVLF frequencies for each specimen, and NAs in every other position of the sequence. |
seq.matrix |
A matrix of amino acid sequences. |
seqlength |
The length of the amino acid sequence. |
The argument convert.matrix can be calculated using the function aa.VLF.convert.matrix
A matrix containing only aaVLFs and NAs in every other position of the sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies, 0.001, 216) birds_aminoAcidVLFs <- VLF.aminoAcids(birds_aaVLFconvert, birds_aminoAcids, 216) ## End(Not run)
Converts a matrix of nucleotide frequencies for each specimen into a matrix consisting entirely of very low frequency variant (VLF) frequencies and NAs in each other position.
VLF.convert.matrix(seq.matrix, freq, p, seqlength)
VLF.convert.matrix(seq.matrix, freq, p, seqlength)
seq.matrix |
A matrix of aligned DNA barcode sequences. |
freq |
A matrix of nucleotide frequencies for each specimen. |
p |
A very low frequency variant designation cut off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
Length of nucleotide sequence. |
The argument freq can be calculated using the function specimen.frequencies.
A matrix of VLF nucleotide frequencies, containing only those nucleotide frequencies that occur less than the designation p value, and NAs in each other position of the matrix.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies,0.001,216) ## End(Not run)
## Not run: data(birds_aminoAcids) birds_aminoAcid_speciesNames <- birds_aminoAcids[,2] aminoAcids_specimenNumber <- nrow(birds_aminoAcids) birds_aminoAcid_count <- aa.count.function(birds_aminoAcids, 216) aminoAcid_frequency.Matrix <- aa.frequency.matrix.function(birds_aminoAcid_count, 216) bird_aminoAcid_frequencies <- aa.specimen.frequencies(aminoAcid_frequency.Matrix, birds_aminoAcids, birds_aminoAcid_speciesNames, 216) aminoAcid_Modal <- aa.MODE(aminoAcid_frequency.Matrix, 216) birds_aminoAcid_specimenVLFcount <- aa.VLF.count.spec(bird_aminoAcid_frequencies, 0.001, 216) birds_aaVLFconvert <- aa.VLF.convert.matrix(birds_aminoAcids, bird_aminoAcid_frequencies,0.001,216) ## End(Not run)
Calculates the number of very low frequency variants (VLFs) in each position in a matrix of sequences.
VLF.count.pos(freq, p, seqlength)
VLF.count.pos(freq, p, seqlength)
freq |
A matrix of frequencies for each specimen. |
p |
A very low frequency variant designation cut off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
The length of the sequences. |
The argument freq can be calculated using the specimen.frequencies function.
A vector containing the number of VLFs for each position in the sequence.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) Bird_position_VLFcount <- VLF.count.pos(birdSpec.freq, 0.001, 648) ## End(Not run)
Calculates the number of very low frequency variants (VLFs) for each specimen in a matrix of sequence nucleotide frequencies.
VLF.count.spec(freq, p, seqlength)
VLF.count.spec(freq, p, seqlength)
freq |
A matrix of nucleotide frequencies for each specimen. |
p |
A very low frequency variant designation vut off frequency. Any frequency in the freq matrix below this value is considered to be a very low frequency variant. |
seqlength |
The length of the sequences. |
The argument freq can be calculated using the function specimen.frequencies.
A vector containing the number of VLFs for each specimen in the matrix.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) ## End(Not run)
Converts a matrix of nucleotide frequencies for each specimen into a matrix of nucleotides for each specimen.
VLF.nucleotides(convert.matrix, seq.matrix, seqlength)
VLF.nucleotides(convert.matrix, seq.matrix, seqlength)
convert.matrix |
A matrix consisting of only very low frequency cariant frequencies for each specimen, and NAs in all other positions of the sequence. |
seq.matrix |
A matrix of DNA sequences. |
seqlength |
The length of the sequences. |
The argument convert.matrix can be calculated using the function VLF.convert.matrix.
A matrix containing only ntVLFs in each position of the sequences, and NAs in all other positions.
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) ## End(Not run)
Reduces a matrix of very low frequency variants (VLFs) so that only those specimen that contain VLFs remain in the matrix.
VLF.reduced(NA.matrix, sCount, seqlength)
VLF.reduced(NA.matrix, sCount, seqlength)
NA.matrix |
A matrix with values for very low frequency variants (VLFs) for each specimen and NAs in all other positions of the sequence. |
sCount |
A vector of the very low frequency variant (VLF) counts for each specimen in the NA.matrix. |
seqlength |
The length of the sequences. |
The argument NA.matrix can be calculated using the function VLF.convert.matrix and VLF.nucleotides. The argument sCount can be calculated using the function VLF.count.spec.
A matrix containing only those specimen that have VLFs, and only VLFs in their positions in the sequence, all other positions contain NAs.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) ## End(Not run)
## Not run: data(birds) species.names <- birds[,2] specimen.Number <- nrow(birds) rownames(birds) <- species.names Nuc.count <- count.function(birds, specimen.Number, 648) frequency.matrix <- ffrequency.matrix.function(Nuc.count, 648) birdSpec.freq <- specimen.frequencies(frequency.matrix, birds, specimen.Number, species.names, 648) Bird_specimen_VLFcount <- VLF.count.spec(birdSpec.freq, 0.001, 648) bird_VLFconvert <- VLF.convert.matrix(birds, birdSpec.freq, 0.001, 648) bird_VLFnuc <- VLF.nucleotides(bird_VLFconvert, birds, 648) bird_VLFreduced <- VLF.reduced(bird_VLFnuc, Bird_specimen_VLFcount, 648) ## End(Not run)
Runs the full nucleotide VLF analysis for the user and outputs the results.
vlfFun(x, p = 0.001, seqlength = 648, own = NULL)
vlfFun(x, p = 0.001, seqlength = 648, own = NULL)
x |
A matrix of nucleotide sequences with the first column containing the unique specimen identifier, the second column containing the species name and the remaining columns containing the nucleotide sequence. |
p |
A VLF designation frequency cut-off to be used within the analysis. By default p = 0.001. |
seqlength |
The length of the nucleotide sequence. By default seqlength = 648. |
own |
If the user wants to compare their own sequences separate from reference sequences, then argument can be used. Similar to x, this argument is a matrix of nucleotide sequences with the first column containing the unique specimen identifier, the second column contains the species name and the remaining columns containing the nucleotide sequence. By default own = NULL.) |
modal |
A vector containing the nucleotide sequence that occurs most often in the dataset. |
con100 |
The number of nucleotide positions that are 100% conserved in the sequence, separated by codon position. |
conp |
The number of nucleotide positions that are (1-p)% conserved in the sequence, separated by codon position. |
combine |
The number of nucleotide positions that are (1-p)% conserved when combining the first and second modal sequences. |
specimen |
A vector containing the number of VLFs for each specimen in the dataset. |
position |
A vector containing the number of VLFs for each position in the sequences. |
sas |
A matrix containing vectors of single and shared ntVLF counts for each position in the sequences. |
VLFmatrix |
A matrix containing only those specimen that have VLFs as well as the nucleotides at the positions that contain VLFs and NAs in all other positions of the sequence. |
ownSpecCount |
A vector containing the number of VLFs for each specimen in the users own specified dataset. Only appears if own is not NULL. |
ownPosCount |
A vector containing the number of VLFs for each position in the sequences of the users own specified dataset. Only appears if own is not NULL. |
ownVLFMatrix |
A matrix containing only those nucleotides at the positions that contain VLFs and NAs in all other positions of the sequence. Only appears if own is not NULL. |
ownVLFreduced |
ownVLFMatrix |
A matrix containing only those specimen that have VLFs as well as the nucleotides at the positions that contain VLFs and NAs in all other positions of the sequence. Only appears if own is not NULL.
Taryn B. T. Athey and Paul D. McNicholas
## Not run: data(birds) bird_vlfAnalysis <- vlfFun(birds) ## End(Not run)
## Not run: data(birds) bird_vlfAnalysis <- vlfFun(birds) ## End(Not run)