Arabidopsis Transcription Factors: Genome-Wide Comparative Analysis Among Eukaryotes

Arabidopsis Transcription Factors: Genome-Wide Comparative Analysis Among Eukaryotes


Transcription factor coding–genes are abundant in the genomes of eukaryotic organisms. In the dicotyledonous plant Arabidopsis thaliana, which has a genome of approximately 125 megabase pairs (Mbp) of DNA that contains over 25,000 genes, more than 1500 genes of such type have been identified. The rice (Oryza sativa) complement of transcription factors is similar in size and composition to that of Arabidopsis. In addition, different cereals are known to have a similar repertoire and arrangement of genes in their genomes. Thus, the research model plant Arabidopsis has a set of transcriptional regulators similar to those of the main staple crops rice, maize, and wheat. Moreover, many Arabidopsis transcription factor genes have been shown to retain their native functions when introduced as transgenes into other plant species.


Gene transcription (the synthesis of RNA molecules from the genomic DNA) is carried out by a multitude of proteins of different biochemical activities that act in concert. These proteins can be classified into different functional groups: the basic transcription apparatus, large multisubunit coactivators, chromatin-related proteins, and transcription factors, which comprise the most numerous of all these groups of proteins.

 Transcription factors are proteins that show sequence-specific DNA binding and are capable of activating and/or repressing transcription. They are responsible for the selectivity in gene regulation and are often themselves expressed in a tissue, cell type, temporal, or stimulus-dependent–specific manner. Transcription factors are modular proteins and can be grouped into families according to their DNA binding domain. Many of the biological processes in eukaryotic organisms are controlled at the level of gene expression, primarily through the regulation of transcription.

In plants, these processes include development, adaptation to the environment, the defense response against pathogens, and metabolic pathways. Moreover, it is now known that morphological changes that occurred during plant domestication and crop improvement were due to mutations in transcription factors, alterations in their expression, or changes in the expression of other types of regulatory proteins, underscoring the importance of this class of genes for plant and crop biotechnology.


The Arabidopsis thaliana genome is the first from a higher plant to be sequenced. It comprises approximately 125 Mbp of DNA and shows a compact organization of high gene density. On average, there is one gene per 4.5 kilobases (kb) of DNA: approximately 2 kb corresponds to exons and introns, and approximately 2.5 kb corresponds to intergenic regions, which include regulatory sequences such as the promoter and enhancers.

Other plants—maize, for example—have genomes that are much larger than that of Arabidopsis, but have similarly organized coding and regulatory sequences. In monocots, active genes are usually distributed in compact gene-rich islands, where much of the genomic DNA corresponds to repetitive sequences. Despite its simplicity, the Arabidopsis genome bears extensive duplications, including many tandem gene duplications and large-scale duplications between different chromosomes, which might affect 40% of its total genes. Duplications can be an obstacle to gene functional analysis because they often result in functional redundancy or overlap between the duplicated genes.

 The Arabidopsis complement of transcription factor coding–genes have been described and reviewed in detail elsewhere. In brief, the Arabidopsis genome codes for at least 1572 transcription factors (or approximately 6% of its approximately 26,000 total genes), which can be grouped into more than 45 different gene families. Such global content of transcriptional regulators is comparable to those of other eukaryotic organisms. 

However, it is well known that many transcription factor gene families exhibit great disparities in abundance among the different eukaryotic kingdoms and that some families are kingdom-specific. Approximately 45% of the Arabidopsis transcription factors belong to plant-specific gene families, and approximately 53% belong to families found in plants, animals, and fungi. Some of the plant-specific transcription factor families are large, such as AP2/ERF, NAC, WRKY, ARF/IAA, and Dof.

Some other groups, such as the MYB, MADS, and bZIP, which are not particularly numerous in animals or yeast, have been significantly amplified in the plant lineage. This points to the large degree of diversity in transcriptional regulators present in the different eukaryotic kingdoms. In general, it appears that most of the transcription factor families in Arabidopsis are involved in a variety of different biological functions, and vice versa. There are, however, some exceptions; for instance, MADS-box genes are most frequently involved in developmental processes.


The determination of the sequence of the rice genome, and the large collection of cDNA sequences from other plants available in databases, answer this question. Despite the very different appearance and lifestyle of Arabidopsis and rice, and the fact that the rice genome contains a higher total number of genes, their respective complements of transcription factor genes are similar.

The largest transcription factor families in Arabidopsis also appear to be the most prevalent ones in monocotyledonous plants. In addition, many examples of orthology can be identified among Arabidopsis transcription factor genes and those from rice or maize. Putative orthologous MADS-box genes have regularly maintained conserved functions, even after substantial sequence divergence. Moreover, Arabidopsis transcription factors from several different families have been shown to retain their function when introduced into a heterologous species, and vice versa. For example, LEAFY, a meristem identity gene that controls the reproductive switch in Arabidopsis, also triggers flowering when introduced as a transgene in aspen or citrus.

 In summary, the complement of transcription factors appears to be, in its general characteristics, very similar among monocots and dicots; and individual genes can conserve their native function across species. It is also clear, however, that differences exist. For instance, whereas most of the amplification of the MYB- (R1)R2R3 gene family occurred prior to the separation into monocots and dicots, several subgroups in maize appear to have originated recently or undergone duplication. These recent expansions could have allowed a functional diversification that might not be present in Arabidopsis. Conversely, there are also gene families that are larger in Arabidopsis than in rice


The vast majority of Arabidopsis transcription factors have not been genetically and functionally characterized yet. For those that have been, characterization is usually limited to the description of phenotypic differences between mutant and wild-type plants, and to the determination of their expression patterns. However, there is still very little knowledge of the genes that each of the transcription factors regulates. Thus, the function of the Arabidopsis complement of transcription factors, considered as a whole, and the dynamic relationship between the genome and the transcriptional regulators remain largely unexplored. These areas of research can now be pursued with a variety of reverse genetic methods and functional genomic technologies. In addition to helping elucidate the complex logic of transcription at a genome-wide level in multicellular eukaryotes, such research will have a profound impact on plant biotechnology and agriculture.

Previous Post Next Post