Repository logo
 

Machine learning models towards elucidating the plant intron retention code

dc.contributor.authorSneham, Swapnil, author
dc.contributor.authorBen-Hur, Asa, advisor
dc.contributor.authorChitsaz, Hamidreza, committee member
dc.contributor.authorPeterson, Christopher, committee member
dc.date.accessioned2018-01-17T16:45:41Z
dc.date.available2018-01-17T16:45:41Z
dc.date.issued2017
dc.description.abstractAlternative Splicing is a process that allows a single gene to encode multiple proteins. Intron Retention (IR) is a type of alternative splicing which is mainly prevalent in plants, but has been shown to regulate gene expression in various organisms and is often involved in rare human diseases. Despite its important role, not much research has been done to understand IR. The motivation behind this research work is to better understand IR and how it is regulated by various biological factors. We designed a combination of 137 features, forming an "intron retention code", to reveal the factors that contribute to IR. Using random forest and support vector machine classifiers, we show the usefulness of these features for the task of predicting whether an intron is subject to IR or not. An analysis of the top-ranking features for this task reveals a high level of similarity of the most predictive features across the three plant species, demonstrating the conservation of the factors that determine IR. We also found a high level of similarity to the top features contributing to IR in mammals. The task of predicting the response to drought stress proved more difficult, with lower levels of accuracy and lower levels of similarity across species, suggesting that additional features need to be considered for predicting condition-specific IR.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierSneham_colostate_0053N_14484.pdf
dc.identifier.urihttps://hdl.handle.net/10217/185669
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectintron retention
dc.subjectrandom forest
dc.subjectalternative splicing
dc.subjectSVM
dc.subjectmachine learning
dc.titleMachine learning models towards elucidating the plant intron retention code
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sneham_colostate_0053N_14484.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format