Machine Learning Approaches to Transcription Factor Binding Site Search and Visualization
Digital Document
Document
Handle |
Handle
http://hdl.handle.net/11134/20002:860647931
|
||||||
---|---|---|---|---|---|---|---|
Persons |
Persons
Creator (cre): Lee, Chih
Major Advisor (mja): Huang, Chun-Hsi
Associate Advisor (asa): Bi, Jinbo
Associate Advisor (asa): Rajasekaran, Sanguthevar
Associate Advisor (asa): Schwartz, Daniel
Associate Advisor (asa): Shin, Dong-Guk
|
||||||
Title |
Title
Title
Machine Learning Approaches to Transcription Factor Binding Site Search and Visualization
|
||||||
Origin Information |
Origin Information
|
||||||
Parent Item |
Parent Item
|
||||||
Resource Type |
Resource Type
|
||||||
Digital Origin |
Digital Origin
born digital
|
||||||
Description |
Description
A transcription factor (TF) is a protein or protein complex. It regulates the expression of its target genes by physically binding to the regulatory regions of these genes. The binding sites of a TF naturally share a common pattern or motif with one another. Given known binding sites of a TF, a TF model can be built to scan sequences for putative binding sites. This is known as a transcription factor binding site (TFBS) search problem. In this dissertation, we investigate the TFBS search problem using machine learning approaches. In general, the known binding sites of a TF are of variable lengths and have to be aligned before a model can be built. Transcription factor binding site alignment is considered an unsupervised learning problem since no other information about the unaligned binding sites is given. We propose an algorithm that considers the lengths of TFBSs and dependencies of nucleotide positions in a binding site. The novel method is named LASAGNA (Length-Aware Site Alignment Guided by Nucleotide Association). Studies often utilize TFBS search tools to predict the binding sites of a TF in a DNA sequence when binding sites found by assays are not available. The analysis often involves TF model collection, promoter sequence retrieval and visualization, requiring several tools to accomplish. To accelerate TFBS analyses, we developed a novel integrated webtool named LASAGNA-Search. This user-friendly tool allows users to perform the analysis without leaving the site. TFBS search methods are considered supervised learning algorithms since they learn from example binding sites of a TF. Most of the TFBS search methods consider only known binding sites of a TF and hence deal with one-class classification problems. However, non-binding sites contain information about the TF as well. When non-binding sites are available, searching for TFBSs becomes a two-class classification problem. We propose two novel methods named the negative-to-positive vector and the optimal discriminating vector methods, utilizing both binding sites and non-binding sites.
|
||||||
Genre |
Genre
|
||||||
Organizations |
Organizations
Degree granting institution (dgg): University of Connecticut
|
||||||
Held By | |||||||
Rights Statement |
Rights Statement
|
||||||
Use and Reproduction |
Use and Reproduction
these materials are provided for educational and research purposes only.
|
||||||
Note |
Note
|
||||||
Degree Name |
Degree Name
Doctor of Philosophy
|
||||||
Degree Level |
Degree Level
Doctoral
|
||||||
Degree Discipline |
Degree Discipline
Computer Science and Engineering
|
||||||
Local Identifier |
Local Identifier
OC_d_304
|