Teknos

View Original

Peptide Classification with Machine Learning

Classification and Prediction of Antimicrobial Peptides Using N-gram Representation and Machine Learning

Sujay Ratna
Thomas Jefferson High School for Science and Technology

Abstract

Current antibiotic treatments for infectious diseases are drastically losing effectiveness, as the organisms they target have developed resistance to the drugs over time. In the United States, antibiotic-resistant bacterial infections annually result in more than 23,000 deaths, just a small fraction of all cases of antibiotic-resistant infections. A promising alternative to current antibiotic treatments are antimicrobial peptides (AMPs), short sequences of amino acid residues that have been experimentally identified to inhibit the propagation of pathogens. Our straightforward method of peptide classification yields higher accuracy compared to that of previous studies utilizing more complex algorithms. N-grams, strings of characters of length N, were used to represent AMP sequences. Alphabet reduction simplified the amino acid sequences and n-gram analysis assigned attributes to each sequence. Machine learning models performed effectively when differentiating between AMPs and non-AMPs, and when classifying AMPs based on target pathogen class. These findings were used to computationally generate an artificial set of AMPs based on n-gram and amino acid frequencies, greatly increasing the effectiveness of AMP candidate prediction. This knowledge could be applied in the laboratory to synthesize AMPs targeting specific pathogens, directing us toward a clearer target when searching for alternatives to antibiotic treatments.

Introduction

The field of bioinformatics has greatly strengthened our ability to understand biological processes and their mechanisms. Bioinformatics develops and uses data, algorithms, and computational tools to conduct biological research. In this study, a bioinformatics approach has been utilized specifically to classify and predict antimicrobial peptides (AMPs).

While more pathogens continue to develop resistance toward previously effective antimicrobial drugs, the number of medical treatment options available decreases, creating a potential to trigger a global health security emergency. New variations of antimicrobial-resistant infections are increasing, especially in conditions with overused antimicrobial drugs, unsatisfactory infection control, poor sanitary conditions or inappropriate food-handling. The failure of antimicrobial-resistant infections to respond to previously effective drug treatments leads to prolonged illness as well as a higher risk of death. In the United States, antibiotic-resistant bacterial infections annually result in more than 23,000 deaths, a small fraction of all cases of antibiotic-resistant infections. In addition, antimicrobial resistance places a heavy burden on the world economy. By 2050, due solely to antimicrobial resistance, the world population is estimated to be between 11 and 444 million lower than it could have been otherwise, and the economy will lose between $2.1 and $124.5 trillion [2]. Research on alternatives for antimicrobial drugs is still in the early stages, yet none have been approved for clinical use.

However, antimicrobial peptides have been identified as promising candidates to combat drug-resistant pathogens. AMPs are oligopeptides of five to hundreds of residues present in the innate immune system in all living beings. Unlike conventional antibiotics, AMPs target and kill a wide range of foreign microbes, including various bacteria, fungi, and cancer cells. Both the natural and synthetic variants of AMPs target the lipopolysaccharide layer of cell membranes [1, 8]. They cause cell death by inhibition of membrane function, inhibition of external and internal biomolecule synthesis, as well as inhibition of intracellular functions [9, 13]. Because microbes do not tend to alter their external membrane, they result in a decreased likelihood of AMP targets developing resistance. Many eukaryotic cells are not targeted by AMPs, due to the eukaryotic cells’ high cholesterol and low anionic charge. AMPs are extremely efficient killers, and only take seconds to kill the desired microbe after initial contact with the cell membrane [7, 12].

This study integrates n-gram analysis and machine learning to develop a computational model that is able to accurately classify antimicrobial peptides. This method allowed us to analyze how well n-gram frequencies can be used to train machine learning algorithms. n-grams are a commonly used technique in computational linguistics, probability, text categorization, and biology. In this study, an n-gram has been denoted as a contiguous string of n amino acid residues in a protein sequence. The primary structure, or the amino acid sequence, of a protein determines the protein’s three-dimensional structure and implies that disorder, or lack of stable structure, can also be encoded in the sequence [11]. A sequence can be decomposed into a list of overlapping n-grams. A key advantage to using n-gram frequencies is that they are a computationally inexpensive way of analyzing complex patterns in protein sequences.        

Current predictors of AMPs use multiple sequence alignments, secondary structure analyses, PSI-BLAST sequence profiles, or distinctive residue compositions [1]. These predictors require analyzing and comparing entire sequences, and so take relatively longer time compared to n-grams, which decompose sequences into smaller chunks, each of which can be readily analyzed quantitatively. Also, the computational techniques used for predictions are generally “black-box” models such as Neural Networks and Support Vector Machines, and the features that these models utilize are not fully understood [6]. To help with feature selection, n-gram frequency data can also be used to train decision trees, which can provide more insight into how the training data is actually used to create the decision-making process.

Our goals in this study were to uncover specific patterns within the primary structure of AMPs, and to effectively distinguish between AMPs and non-antimicrobial peptides (non-AMPs) as well as the subclasses of AMPs. In addition, we aimed to create new AMP amino acid sequences based on the classification patterns discovered. These goals were achieved using a novel method of analyzing the frequencies of n-gram combinations with machine learning algorithms. We calculated the frequencies of every n-gram (character strings of length n) in the amino acid sequences. We reduced the complexity of the data mining required by using alphabet reduction to create clusters of specific amino acids with similar properties. This lowered the number of possible n-gram combinations and lessened the number of frequencies that were computed. The outcomes from this study could be particularly interesting as we can apply the knowledge toward synthesizing pathogen-specific AMPs in the laboratory, focusing us in a clearer direction when searching for alternatives to antibiotic treatments.

Materials and Methods

Dataset Creation

This study’s sequence-based approach of classification required peptide sequences to be obtained prior to any analysis of antimicrobial peptides or any of its constituent subclasses. To create a raw negative set, a large set of 20,258 non-antimicrobial peptides (non-AMPs) was obtained from UniProt Database with UniRef 50, meaning that each sequence in the non-AMP set was less than 50% similar to the others. To create a raw positive set, 7760 AMPs, regardless of subclass, were amassed from multiple AMP databases: APD, CancerPPD, PhytAMP, LAMP, DADP, EROP, YADAMP, and Bagel-Joomla. In all databases, either text or FASTA files were available for use; converted using a Java program. Sequences below 20 residues in length were eliminated to ensure sufficient size for machine learning analysis. Sequences above 120 residues in length were not considered since only a small part of the sequence may have the antimicrobial features. Next, class-specific AMP sets (antibacterial, antiviral, antifungal, and antiparasitic) were downloaded from CAMP, AVPdb, BACTIBASE, HIVdb, BAGEL, and DBAASP. All class-specific AMP sets were mutually exclusive, verified subsets of the raw positive set.

After obtaining AMP and non-AMP sequences, we used a method called transduction to balance our positive and negative sets, so that the AMP and non-AMP data sets had the same number of sequences. To achieve transduction, a simple random subset of the larger set was taken so that datasets in each trial had exactly the same number of sequences. This procedure was implemented in order to eliminate number of sequences as a confounding variable and reduce the probability of overfitting the data to one of the sets.

Learning Curve

In addition, to ensure that a small sample size of peptide sequences would not be indicative of overfitting, a learning curve was constructed for one classification test. In this study, the learning curve was constructed for the classification of antibacterial peptides against non-Antibacterial AMPs. The curves had increments of 200 sequences.

Alphabet Reduction

The 20-letter amino acid alphabet was reduced to an alphabet of significantly fewer letters to quicken the machine learning process, as the model would have to analyze fewer n-grams (attributes) during training and testing. Another problem with using the original 20-letter alphabet was that the total number of features (n-grams) would exceed the number of sequences, 7760 at most, which is not ideal considering that each sequence (instance) only contains a maximum of 120 amino acid residues. By reducing the number of letters in the alphabet, the number of possible n-grams decreases, and the number of sequences is significantly larger than the total number of features, so it is highly unlikely for the model to overfit. A Java program was written to apply each alphabet reduction option to each set of peptide sequences. Each program is designed to traverse each sequence of each peptide set, and denote each amino acid by which cluster it is located in. The reduced alphabets were taken from outside sources (Table 1). Residues can be clustered based on various properties, including chemical and genetic properties. Reduced alphabets cluster residues in ways that prevent the loss of key biochemical information.

Table 1. Number of AMP sequences in each class-specific AMP set

N-gram Frequency Calculation

As previously stated, three letter n-grams were used, leading to 27 distinct combinations of amino acid sequences (3 distinct letters). An n-gram frequency was normalized by dividing the frequency by the product of the frequencies of each of its constituent (reduced) residues. The frequency of each sequence was calculated by determining how often this three letter amino acid sequence occurred throughout the entire peptide sequence, and to divide that by the total number of possible n-grams in the peptide. n-gram frequencies were calculated as follows:

represents the number of times the particular n-gram occurs in the sequence,  is the total number of n -grams in the sequence, and  is the length of the sequence. n could not be made too large because the total number of possible n-grams would run the risk of exceeding the size of the data set, which could cause model overfitting. n-gram frequencies were then normalized in order to prevent frequency of a feature from skewing the decision process. The following shows how the 3-gram “ijk” was normalized: qijk = fijk / ( fi * fj * fk ).

Process Automation

A suite of Java programs was written for data parsing and extraction, data normalization, alphabet reduction, n-gram frequency calculations. We initially used Python to calculate n-gram frequencies, but Python is not capable of printing a data structure, such as a dictionary, to an output file without losing our preferred format of a simple array. Thus, we created a Java program to automate the entire process from parsing the sequences from the databases to create arff files to input into WEKA. A main design principle of our code was modularity, as we were able to increase speed and usability of the program. We used complex data structures, such as HashMaps and ArrayLists, to maximize efficiency and PrintWriter to format the output.

Machine Learning Models

Machine learning is the computational technique of constructing models that are trained on an initial data set and can make predictions about future data. Machine learning algorithms, such as Random Forest and Naive Bayes, use training data to generate classifiers that can assign labels to new data. Algorithms such as these can be trained using protein sequences’ n-gram frequencies to decide whether a sequence is disordered or ordered. After applying n-gram analysis to extract features from AMPs and representing each sequence uniquely with a vector of n-gram combination frequencies, machine learning algorithms are then employed to determine the family to which a genome belongs from a vector of distinct feature values. Two machine learning algorithms, Random Forest and Support Vector Machine (SVM), are subsequently utilized to learn from the frequencies of sequences of different classes to develop classification models. Weka, an open source machine learning software developed at the University of Waikato, was used for this purpose. The reliability of the n-gram analysis methodology is validated by first testing its ability to confirm more concrete information, evaluating the accuracy to which sequences from varying subtypes can be classified.

Output from n-gram Java programs was converted to ARFF format, compatible with the Waikato Environment for Knowledge Analysis (WEKA) software version 3.8. WEKA contains machine learning tools to classify, cluster, and visualize a given data set. The Explorer GUI in WEKA was used to classify sequences based on n-gram frequencies. Machine learning algorithms J48, Random Tree, Random Forest, and SMO (Sequential Minimal Optimization) with 10 trees were used with 10- and 20-fold cross-validation.

The J48 algorithm works as a Decision Tree algorithm that implements pruning. The algorithm utilizes a series of binary queries to determine if a sequence is antimicrobial or what class of AMP the sequence is. Random Forest works similarly to the J48 algorithm, but first splits the dataset into several randomly chosen sets and then creates a decision tree for each set. Random Tree is slightly different than J48 in that it does not implement pruning. Additionally, only a random subset of classifiers is used to predict the identity of peptide sequences. SMO uses an N-1-dimensional hyperplane to divide a space of N dimensions into two spaces. In this case, N represents the number of attributes in each instance, not including the identifier. For example, in a case where 3-grams are alongside an alphabet reduction of 3 clusters, a 26-dimensional hyperplane will divide a 27-dimensional space. All instances that fall on different sides of the plane will be considered to have a different determining characteristic.

Results

The primary goal of this study was to successfully classify antimicrobial peptides using a straightforward, sequence-based method that involved alphabet reduction, n-gram analysis with either frequency or likelihood, and machine learning. More sophisticated goals of this study were classification between classes of AMPs and creation of an artificial set of AMPs. Success rates in this study for some classification trials were comparable to that of previous studies by researchers conducting experiments with tangible AMPs in biochemistry laboratories.

The classification of AMPs against non-AMPs was successful. Models achieved a maximum accuracy of 85.0% using frequency n-gram analysis, alphabet reduction option 9, and the Random Forest model with 10 trees and 20-fold cross-validation. Many other auxiliary experiments were conducted for this same dataset and alphabet reduction. Additionally, label randomization was utilized as a control to ensure the fidelity of the dataset. All model accuracies were 50 ± 0.6%, implying that all models resulting from non-randomized label experiments for this dataset yielded reliable results. Furthermore, a learning curve was created using dataset sizes ranging from 200 to 7,760 (the size of the full positive set). Nevertheless, the curve was generally flat between these points, indicating that the use of small sample sizes of the raw, full-sized sets did not cause accuracies to vary significantly. Finally, a set of ROC (Receiver Operating Characteristic) area curves was created. ROC area is restricted to a real number between 0 and 1, with area near 1 indicating few false positives in the data and with area near 0.5 indicating truly random results. ROC area was 0.9142 for a size of 7760 sequences, 0.8380 for 5820 sequences, 0.8292 for 3880 sequences, 0.916 for sequences, and 0.4978 for the control trial with 7760 sequences. The considerably high ROC areas indicate that no overfitting occurred in the models and that true positive rates were large compared to false positive rates.

Classification using more specific classes of AMPs was conducted next. First, classification of ABPs against nonABP AMPs achieved a maximum accuracy of 100% using likelihood n-gram analysis, alphabet reduction option 7, and SMO model with 10-fold cross-validation. However, the average model accuracy for this specific trial was only 80.7%. Thus, a learning curve was also constructed used dataset sizes ranging from 100 to 1875 with increments of 75 to 200 sequences. Accuracies obtained a minimum at a dataset size of 100 sequences and a maximum at a dataset size of 1875 sequences; they were nearly constant as the size of the dataset started to exceed 500 sequences. Second, classification of ABPs against non-ABP AMPs achieved a maximum accuracy of only 66.9% using frequency n-gram analysis, alphabet reduction option 7, and SMO model with 10-fold cross-validation. Similar classification of antibacterial AMPs against non-AMPs was achieved a maximum accuracy of 83.6% using the frequency n-gram analysis, the same alphabet reduction and machine learning model.

This study also consisted of many other successful trials. A third successful trial classifying ABPs against AVPs yielded a maximum accuracy of 81.8% using n-gram frequency analysis, alphabet reduction option 4, and the Random Forest model with 10 trees and 10-fold cross-validation. This trial yielded an average accuracy of 77.2%, showing some evidence of consistency among models. A fourth successful trial classifying AVPs against non-AVP AMPs had a maximum accuracy of 80.7% and an average accuracy of 76.2% using n-gram frequency analysis, alphabet reduction option 9, and the Random Forest model with 10 trees and 20-fold cross-validation. Models obtained an average accuracy of about 75%, also displaying consistency. A fifth successful trial classified AVPs against AFPs and achieved a maximum accuracy of 80.5% and an average of about 75% using n-gram frequency analysis, alphabet reduction option 9, and the Random Forest model with 10 trees and 10-fold cross-validation. A sixth and final successful trial classified AVPs against non-AMPs and achieved a maximum accuracy of 81.7% and an average of 76.1% using n-gram frequency analysis, alphabet reduction option 10, and the Random Forest model with 10 trees and 20-fold cross-validation.

However, this study contained less successful results as well, with maximum accuracies ranging between 70% and 80%. Trials yielding such results included ABPs against APPs, AVPs against APPs, and AFPs against APPs. Furthermore, trials obtaining maximum accuracies below 70% were conducted in this study: AFPs against non-AFP AMPs, ABPs against non-AMPs, AFPs against non-AMPs, APPs against non-AMPs, and ABPs against AFPs. Although not all trials classifying AMPs yielded high accuracies, there were many trends present. First, the Random Forest model frequently yielded the highest rate of successfully classified sequences among other models tested; the next most accurate model was the Sequential Minimal Optimization (SMO) model. Second, among all alphabet reductions used in this study, alphabet reductions containing two clusters consistently obtained the lowest accuracies, whereas alphabet reductions containing three or more clusters consistently obtained the highest accuracies. Third, models had considerable difficulty bi-differentiating between APPs and other classes of AMPs. In addition, models nearly failed to discriminate classes of AMPs from non-AMPs.

Discussion

The straightforward, sequence-based classification of antimicrobial peptides was successful. We learned several trends in the accuracies throughout all classification trials.

Machine Learning

First, Random Forest with either 10-fold or 20-fold cross-validation yielded the highest accuracy. This may have occurred because Random Forest utilizes several unique decision trees, each with its own parameters. Due to the mechanism of the Random Forest algorithm, a reduced chance of overfitting was present, authenticating the obtained accuracies. Additionally, the accuracies from the Random Forest algorithm should not be dismissed as an artifact of machine learning, as the algorithm has yielded results similar to those of J48, SMO, and Random Tree. Furthermore, ROC area values are consistently high, further confirming the validity and reliability of the accuracies from Random Forest.

Alphabet Reduction

Second, throughout all 16 classification tests, the two-cluster alphabet reductions (1 and 2) never achieved the highest accuracies for a given classification. The two-cluster alphabets were grouped mainly by hydrophobicity, an important characteristic for AMPs. However, reducing the original 20-letter alphabet to just two letters resulted in a severe loss of information stored in the original amino acid sequence. Thus, alphabet reductions with just two clusters were always outperformed by other alphabets. In addition, alphabet reduction 9 most often yielded the highest classification accuracies; 6 of 16 classifications showed this reduction as the most successful alphabet. This finding implies that 4-cluster alphabet is optimal for n-gram frequency analysis and machine learning. A 4-cluster alphabet reduces the alphabet so that amino acid sequences are simple enough for efficient machine learning but complex enough to the extent that information losses in the original sequences are minor. Furthermore, each of the alphabet reductions 6, 7, and 10 yielded the highest classification accuracies in exactly two classifications. This finding shows that even alphabet reductions with 3 or 5 clusters are considerable options for the sequence-based method of analysis utilized in this study.

Control Experiment

The method of sequence-based analysis introduced in this study used with label randomization was expected to yield a model accuracy of 50% and an ROC area value of 0.5. Both of these expectations were true. All model accuracies were 50 ± 0.6% and the ROC area value for AMPs against non-AMPs was 0.4978. Label randomization verified the integrity of the dataset and implied that all models used in this comparison would yield reliable accuracies.

Learning Curves

The learning curves created showed that model accuracies varied minutely with respect to dataset size. The ABP against nonABP AMP classification, containing 1875 sequences per set, yielded consistent accuracies above 500 sequences. The AMP against nonAMP comparison yielded consistent accuracies above 200 sequences. These quantitative findings indicated that even a small dataset size would not cause a significant difference in model accuracies, which adds credibility to the accuracies mentioned in this study.

Classification of Peptides

Accuracies were highest when differentiating between strictly AMP and non-AMPs. As mentioned before, 85.0% accuracy for this classification is comparable to accuracies of previous studies utilizing more complex methods of classification. Again, consistently high accuracy, high ROC area values, and static learning curves corroborate for a successful classification. Some of the class-specific classifications performed similarly to the primary AMPs against non-AMPs classification, achieving accuracies above 80%, where most classifications involved AVPs against some other group of peptides. Models may have been able to discriminate AVPs more easily because viruses are unique microbes compared to bacteria, fungi, and parasites. Viruses only propagate when inside host cells, but not necessarily when inside host organisms. Furthermore, although viruses contain genetic material and have the ability to replicate, they possess a protein coat and a lipid envelop rather than a lipid-rich membrane.

A class-specific classification that performed unusually well was ABPs against nonABP AMPs, achieving a maximum accuracy of 100% using likelihood n-gram analysis and alphabet reduction option 7. The accuracy was high as a result of the comprehensiveness of the ABP data set from multiple different databases. The ABP data set contained 1,875 sequences, a large fraction of the 7,760 sequences in the AMP data set. Additionally, this outlier-resembling finding may indicate that likelihood n-gram analysis is more optimal for our method of classification than frequency n-gram analysis. Also, this finding hints that alphabet reduction option 7 is the best option for discriminating ABPs among all AMPs. Classifications that outperformed the label randomization trials heavily involved non-AMPs against classes of AMPs. Further, AMPs against non-AMPs classification yielded accuracies upwards of 80%.

Interestingly, the classification of ABPs against AFPs yielded unusually low accuracies. Possibly because of the inherent similarities between ABPs and AFPs. First, some AMPs in the dataset may have been both an ABP and an AFP, slightly distorting the model and lowering maximum accuracies. Additionally, some fungi, like yeast, are unicellular and have the capability to reproduce asexually, similar to bacteria. Also, fungi and bacteria share the role of supporting multiple food webs by bolstering the nutrition of the soil. Several classifications of AMPs were highly successful; however, additional trials may be required to further understand the data.

Conclusion and Future Work

The overarching purpose of this study was to introduce a straightforward method of antimicrobial peptide classification that would not only surpass the success rates of previous studies, but also advance the sequence-based classification of AMP subclasses. However, further computational experiments are necessary to corroborate and extend the results. In addition, further research would significantly benefit this study by providing explanations for a number of the interesting results, including the relatively low model accuracy classifying antibacterial against antifungal peptides. One reason for this discrepancy is that bacteria and many fungi have many similarities, such as unicellularity and the ability to act as decomposers in an ecosystem, that impair the ability of machine learning models to differentiate between these subclasses.

Furthermore, the transduction technique used to reduce the possibility of overfitting our machine learning models by balancing data sets may not have been successful. For example, in a comparison of antibacterial peptides against antifungal peptides, since we had an extremely small sample size of 21 antiparasitic peptides compared to a relatively immense sample size of 1,875 antibacterial peptides, the accuracy for models classifying these peptides were likely to vary noticeably when different samples of antibacterial peptides were taken. We propose two ways to potentially improve the machine learning accuracy: bagging, the combination of machine learning algorithms; and obtaining larger datasets of antiparasitic peptides. The results of this study indicate that sequence-based classification of antimicrobial peptides are a viable classification alternative with massive potential for further research. The model performances among all of the machine learning algorithms indicate that an n-gram based approach to differentiation into the subclasses of AMPs, specifically antiviral, antibacterial, and antifungal peptides is an effective and efficient method. Accuracies may have been higher if there were several hundreds of APP sequences available for use.

In summary, our models have much higher expected accuracies than any models that utilize random guessing, and our highest accuracies are comparable to those of previous studies by researchers conducting experiments with tangible AMPs in biochemistry laboratories. By constantly improving the classification methods, biomedical researchers collaborating with other medical professionals would be able to further advance potential replacement of antibiotics with AMPs. By testing the models with increased specificity, the discovery or synthesis of peptides to combat specific classes of microbes becomes promising. Our results suggest that the classifiers produced possess great predictive power and can be of significant use in various biological and medical applications, potentially saving hundreds of thousands of lives.


References

[1] Bahar, A. A., & Ren, D. (2013). Antimicrobial peptides. Pharmaceuticals, 6(12), 1543-1575. http://dx.doi.org/10.3390/ph6121543

[2] Centers for Disease Control and Prevention. (2016). Antibiotic / Antimicrobial Resistance [Pamphlet]. Retrieved from https://www.cdc.gov/drugresistance/

[3] da Costa, J. P., Cova, M., Ferreira, R., & Vitorino, R. (2015). Antimicrobial peptides: An alternative for innovative medicines? Applied Microbiology and Biotechnology, 99(5), 2023-2040. http://dx.doi.org/10.1007/s00253-015-6375-x

[4] Gautam, A., Sharma, A., Jaiswal, S., Fatma, S., Arora, V., Iquebal, M. A., . . . Kumar, D. (2016). Development of antimicrobial peptide prediction tool for aquaculture industries. Probiotics and Antimicrobial Proteins, 1-9. http://dx.doi.org/10.1007/s12602-016-9215-0

[5] Malmsten, M. (2014). Antimicrobial peptides. Upsala Journal of Medical Sciences, 119(2), 119-204. http://dx.doi.org/10.3109/03009734.2014.899278

[6] Oyinloye, B. E., Adenowo, A. F., & Kappo, A. P. (n.d.). Reactive oxygen species, apoptosis, antimicrobial peptides and human inflammatory diseases. Pharmaceuticals, 8, 151-175. http://dx.doi.org/10.3390/ph8020151

[7] Porto, W. F., Pires, &. S., & Franco, O. L. (2012). CS-AMPPred: An updated SVM model for antimicrobial activity prediction in cysteine-stabilized peptides. PLoS ONE, 7(12). http://dx.doi.org/10.1371/journal.pone.0051444

[8] Reddy, K., Yedery, R.D., & Aranha, C. (2004). Antimicrobial peptides: Premises and promises. International Journal of Antimicrobial Agents, 24(6), 536-547. http://dx.doi.org/10.1016/j.ijantimicag.2004.09.005

[9] Taylor, J., Hafner, M., Yerushalmi, E., Smith, R., Bellasio, J., Vardavas, R., . . . Rubin, J. (2014). Estimating the economic costs of antimicrobial resistance: Model and results. Retrieved from RAND website: http://www.rand.org/randeurope/research/projects/antimicrobial-resistance-costs.html

[10] Thomas, S., Karnik, S., Barai, R. S., Jayaraman, V. K., & Idicula-Thomas, S. (2009). CAMP: A useful resource for research on antimicrobial peptides. Nucleic Acids Research, 38(1), D774-D780. http://dx.doi.org/10.1093/nar/gkp1021

[11] Torrent, M., Nogues, M. V., & Boix, E. (2012). Discovering new in silico tools for antimicrobial peptide prediction. Current Drug Targets, 13(9), 1148-1157. http://dx.doi.org/10.2174/138945012802002311

[12] World Health Organization. (2014, June). Antimicrobial resistance global report on surveillance. Retrieved from http://apps.who.int/iris/bitstream/10665/112642/1/9789241564748_eng.pdf

[13] Yeaman, M. R., & Yount, N. Y. (n.d.). Mechanisms of antimicrobial peptide action and resistance. Pharmacological Reviews, 55(1), 27-55. http://dx.doi.org/10.1124/pr.55.1.2

[14] Zuo, Y., Lv, Y., Wei, Z., Yang, L., Li, G., & Fan, G. (2015). iDPF-PseRAAAC: A web-server for identifying the defensin peptide family and subfamily using pseudo reduced amino acid alphabet composition. PLoS ONE, 10(12). http://dx.doi.org/10.1371/journal.pone.0145541