To use all functions of this page, please activate cookies in your browser.
my.chemeurope.com
With an accout for my.chemeurope.com you can always see everything at a glance – and you can configure your own website and individual newsletter.
 My watch list
 My saved searches
 My saved topics
 My newsletter
Gene chip analysisAdditional recommended knowledge
IntroductionMicroarray is a powerful tool for genome analysis. It gives the global view of the genome analysis in a single experiment. Data analysis in the Microarray is a vital part as this part influences the final result. Each microarray experiment yields at least thousand data points. Each microarray study comprises multiple microarray experiments, each microarray study would give tens of thousands of data points. Since the volume of data growing exponential, the analysis becomes a challenging task. In general the greater the volume of data, the more chances arise for erroneous results. Handling such large volumes of data requires high end computational infrastructures and programs that can handle multiple data formats. There are already programs available for microarray data analysis on various platforms. But due to rapid development, diversity in microarray technology, and different data formats, there is always the need for comprehensive and complete microarray data analysis Data analysisData analysis is the critical part of the whole analysis, since any error introduced in the data analysis part will result in biologically insignificant results. In data analysis, the information from the raw data file is further processed to yield meaningful biological results. This part includes data normalization, Flagging of the data, Averaging the ratio for replicates, Clustering of similarly expressed genes, etc. Each replicate data has to undergo normalization before further analysis. Normalization removes the nonbiological variation between the samples. After the normalization, the ratio is calculated for each gene in the replicate. Based on the ratio, differentially regulated genes are determined. There are various statistical analyses which are also done for confidence analysis. Each replicate data is also examined for various experimental artifacts, bias by computing parameters related to intensity, background, flags, spot details, etc.
ReplicatesIt is important to note the necessity in conducting Microarray experiments in replicates. Like any other quantitative measurements, repeated experiments provide the ability to conduct confidence analysis and identify differentially expressed genes at a given level of confidence. More replicates provide more confidence in determining differentially expressed genes. In practice, three to five replicates would be an ideal. NormalizationNormalization is required to standardize data and focus on biologically relevant changes. There are many sources of systematic variation in Microarray experiments that affect the measured gene expression levels such as Dye bias, Heat and light sensitivity, Efficiency of dye incorporation, Difference in the labeled cDNA Hybridization conditions, Scanning conditions, and Unequal quantities of starting RNA etc. Normalization is important step to Adjust data set for technical variation and removing relative abundance of gene expression profiles, this is only point where 1 and 2 color data analysis vary. The normalization method depends on the data. The basic idea behind all the normalization methods is that the expected mean intensity ratio between the two channels is one. If the observed mean intensity ratio deviates from one, the data is mathematically processed in such a way that the final observed mean intensity ratio becomes one. When the mean intensity ratio is adjusted to one, the distribution of the gene expression is centered so that genuine differentials can be identified Quality controlBefore doing analysis the biological variation must perform QC steps to determine if the data is fit for statistical test. Statistical tests are very sensitive to the nature of the input data. Filtering of flagFiltering on bad intensity spot is an important process of quality control For example; there is a certain limit of the scanner below which the intensity values cannot be trusted anymore. Typically, the lowest intensity value of the reliable data is about 100–200 for Affymetrix data and 100–1000 for cDNA Microarray data. These cutoffs are likely to change, as the scanners get more precise. The values below the cutoff point are usually removed (filtered) from the data, because they are likely to be artifacts. Filtering of noise replicateFiltering the noise replicate is one of the crucial parts in quality control. Experimental replicate should behave in similarly pattern. The replicates with noise should be eliminated before analysis .the noise replicate can be removed ANOVA statistical method Filtering of non significant geneFiltering of non significant is done to reduce the number of genes so that analysis could be done on selected genes. Nonsignificant genes were removed by specifying relative fold changewith respect to normal control. For over expressed and underexpressed genes values were given 2 & 2. As a result of the filtration few genes where retained. the remaining gene are then subjected to statistical analysis. Statistical analysisStatistical analysis plays a vital role in identifying the gene which is statistically significant expressed. ClusteringClustering is a data mining technique used to group the genes, which as similar expression patterns. Hierarchical clustering, kmean clustering are widely used technique in microarray analysis. Hierarchical clusteringHierarchical clustering is a statistical method for finding relatively homogeneous Clusters. Hierarchical clustering consists of two separate phases. Initially, a distance matrix containing all the pair wise distances between the genes is calculated. Pearson’s correlation or Spearman’s correlation are often used as dissimilarity estimates, but other methods, like Manhattan distance or Euclidian distance can also be applied. If the genes on a single chip need to be clustered, the Euclidian distance is the correct choice, since at least two chips are needed for calculation of any correlation measures.After calculation of the initial distance matrix, the hierarchical clustering algorithm Either iteratively joins the two closest clusters starting from single clusters (Agglomerative, bottomup approach) or iteratively partitions clusters starting from the complete set (divisive, topdown approach). After each step, a new distance matrix between the newly formed clusters and the other clusters is recalculated. If there are N cases, Hierarchical cluster analysis including: • Single linkage (minimum method, nearest neighbor) • Complete linkage (maximum method, furthest neighbor) • Average Linkage (UPGMA). Kmean clusteringKmean clustering is an algorithm to classify or to group genes based on pattern into K number of group. K is positive integer number. The grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Thus the purpose of Kmean clustering is to classify the data based on similar expression. (www.biostat.ucsf.edu). Gene ontology studyOntology study gives the biologically meaning full information like cellular location, molecular function and biological function about the gene which are differentially regulated in disease or drug treatment condition with respect to normal contol. Pathway analysisPathway analysis gives the specific information about the pathway being affected in disease condition with reference to normal control. This pathway analysis also allows to identify the gene network and the genes how it regulated. AuthorT.Hema Thanka Christlet,S.S.J.Shiek Fareeth Ahmed,A.Ahameethunisa,Janani Kannan. Dept of Biotechnology,SRM University 

This article is licensed under the GNU Free Documentation License. It uses material from the Wikipedia article "Gene_chip_analysis". A list of authors is available in Wikipedia. 