In the post-genome-sequencing age, more expectations are being placed on biotechnology as a new industrial field of the 21st century. Biotechnology-based industries are being established in Asian countries as well as in Europe and the United States. In Japan, five government ministries and agencies have together launched the “Millennium Project” to foster biotechnology-based industries in academic, business, and governmental circles. Biotechnology-based industries are based on a wealth of complicated biological information and are sometimes referred to as information industries. Information is playing an increasingly important role in biotechnology-based industries. Since information technology (IT) is one of Japan’s strong points, Japan must aggressively integrate its IT and biotechnology expertise so that its biotechnology-based industries continue to grow and become internationally competitive. It is also necessary for Japan to promote cooperation in this field with neighboring Asian countries, which have tremendous biodiversities and human resources and large markets. Japan is expected to be a major biotechnology information-base in Asia. This paper describes various approaches to biotechnology-based industries both inside and outside of Japan. It also looks at the IT developments that Japanese biotechnology-based industries hope will be made in the near future.
The DNA Data Bank of Japan (DDBJ) was established at the Genetic Code Research Center of the National Institute of Genetics in 1986 to promote DNA sequence registration and research activities. The DDBJ, together with the GenBank of the US and the European EMBL, forms the international base sequence database. The Genetic Code Research Center was established in 1984 and was changed to the Center for Information Biology in 1995. Then, in April 2001, the center was changed again to The Center for Information Biology and DNA Data Bank of Japan (CIB/DDBJ). The DDBJ is a public research center and is the main international and public data bank in Japan. Because of its primary position, the DDBJ will play a very important role in the future development of bioinformatics in Japan. The supercomputer system at the National Institute of Genetics is designated as a cooperative facility of Japan. Its HPC system, which includes a VPP5000, is open to external researchers, and the DNA data registration, analysis, and search services of the DDBJ are made available worldwide. Fujitsu has helped the DDBJ to introduce, construct, and operate this supercomputer system and has supplied system engineers to develop and operate the DDBJ services. This paper introduces the DDBJ system and services of the National Institute of Genetics and describes its future development.
The JBiC biotechnology database system is an integrated database system operated by the Japan Biological Informatics Consortium (JBiC) for biotechnology researchers. This system enables joint use of various kinds of biotechnology-related databases and is intended mainly for researchers at the biotechnology-related companies participating in the JBiC. This paper first describes the outline and features of the JBiC biotechnology database system. Next, it describes the application and effects of the system's agent technology for cross-sectional searching of multiple biotechnology databases of various kinds, which is an essential part of the system. Then, this paper describes the mechanism and background of data encryption and search-analysis results acquisition within the system, both of which are done using one-time passwords. Lastly, this paper describes how this system is made available to multiple users while maintaining the security of the information it contains.
The age of post-sequencing has begun, and single nucleotide polymorphism (SNP) has attracted attention in DNA polymorphism research. SNPs, which are found at a constant rate on the genome, are very useful markers for disease-related gene research and drug-effect studies. Huge volumes of SNP information have already been registered in public SNP databases, and many researchers have approached such data in various ways. This paper describes the creation and use of a new SNP Catalog DB system that integrates the SNP information stored in several databases and makes it available for use.
The GeneDiscovery system supports functional analysis of cDNA and genome sequence data. It helps bioinformatics researchers to reduce their workloads with automatic collection of function-related annotated information, original analytic methods, and automation of analysis work. The system is useful for searching for object compounds, which is the major task in the search for gene-based drugs. For DNA sequence analysis, the GeneDiscovery system enables researchers to collect function-related annotated information automatically on the basis of homology search and LocusLink data, extract meaningful subsets of genes by alignment analysis, and search for motifs. For amino-acid sequence analysis, this system enables researchers to predict the antigen determinant of a protein through prediction of secondary structure, flexibility, hydrophobicity, and antigen determinant base on a converted amino acid. This paper describes the roles and features of the GeneDiscovery system.
The partial sequences of amino acids, which appear in common with homologous proteins, are called motifs. Motifs are extremely important in predicting the functions of proteins. Since 1990, Fujitsu has been collaborating with NIG in developing a molecular evolutionary analysis software called SODHO for automatically extracting the preservative profiles (sequence patterns) of the amino acids of homologous proteins and generating motifs represented by regular expressions. Recently, we have been developing SODHO to use the Hidden-Markov-Model to represent motifs, which enables SODHO to be used as a more practical system. SODHO has been applied to the strange, new proteins obtained from full-length cDNA projects and it has provided excellent results. In this paper, we discuss the structure of SODHO and the latest results obtained by using it.
Genetic statistical analysis of genome sequences using polymorphic markers-among which, microsatellite markers (repetitions of two to four nucleotides) and single nucleotide polymorphisms (SNPs) are typical examples-is one of the most effective investigative procedures in the search for disease-related genes. Significant advances are being made in the study of the application of the genetic statistical analysis procedure. We researched genetic statistical analysis in collaboration with the Institute for Genome Research of the University of Tokushima. This paper gives a detailed description of the theory and procedure of the genetic statistical analysis that we researched and problems encountered during the research and an example application of IT in the search for disease-related genes and the application's effectiveness. This paper also explains polymorphic genetic markers, which are important for polymorphic genetic analysis.
Recent advances in biotechnology have made it possible to determine the genome sequences and the amino-acid sequences and structural information of various proteins. As a result, more and more drugs are being designed based on the structures of proteins. Requests for information about proteins using the method of computational chemistry and the use of this information to design drugs have also increased. In this paper, we introduce the software of the MOPAC and MASPHYC packages for computational chemistry applications. Next, we introduce some of the new knowledge we have obtained by applying the molecular orbital method to the general protein system using MOPAC and introduce a molecular dynamics simulation of the protein-water system using MASPHYC. Finally, we introduce our activities in the development of the in silico screening system to effectively use this computational chemistry application technology in drug research.
Cheminformatics is a generic name for the information technologies (IT) for studying and realizing methods of analyzing and processing vast amounts of the many kinds data derived from chemical research and storing the data in a database. This paper explains how cheminformatics systems have evolved and progressed, describes the present technical trend and problems related to cheminformatics, and forecasts the future directions of cheminformatics systems and Fujitsu's activities in this field. This paper also describes how important it is to combine cheminformatics systems with bioinformatics (biology information engineering) systems, which are often paired with each other in practical applications.
Recently, it has become apparent that cytochrome P450 (CYP) is widely distributed among the species and plays various important roles, although it was thought to be an enzyme participating in special reactions when it was first discovered. About 10% of the several hundred kinds of CYP are xeonobiotic metabolizing (drug-metabolizing) enzymes of mammals. These CYPs have a very wide substrate specificity and are the key enzymes in drug metabolizing processes. CYPs have genetic single nucleotide polymorphisms (SNPs). It is thought that SNPs are partly responsible for the differences in drug effects among individuals and that CYP activity is connected with the carcinogenic activities of chemicals. Therefore, there is a demand for a database of CYP information and CYP metabolic products to assist in the research, development, and administration phases of drugs. This paper describes the history of CYP research and xenobiotic metabolization by CYP. This paper also introduces the CYP information database and metabolic-product prediction system that we are now developing. This database and system make it possible to use a chemical structure diagram or a metabolic reaction diagram for searching and predicting the metabolite of a target chemical based on the metabolic reaction data in the database.
Now that a large amount of genome sequences, including the human genome sequence, are widely available, biomedical research is shifting to the phase called the "post-genome era." Protein-protein interaction is an important research issue in this era for understanding the mechanisms of biological processes. An enormous amount of data, including gene expression data and genome sequences, are required to elucidate these interactions. Computers will then be used to help researchers in the analysis and interpretation of experimental data. Especially, to convert numerical data into biologically significant information, the knowledge resources that have been accumulated in various languages should be useful. This paper discusses the role of biological literature databases, especially how they help researchers interpret biological data. This paper also introduces our XML document retrieval technology, which can be used as a foundation for building biological research assistance systems.
Now that the human genome has been read, researchers in the field of bioinformatics are turning their attention to how they can make full use of the enormous amount of information they have obtained and what kind of knowledge they can acquire from it. To support bioinformatics in the post-genome era, we have proposed an integrated solution called the Post-Genome Platform as a backbone environment for research and development. The Post-Genome Platform consists of a high-speed, high-capacity hardware platform, a basic bioinformatics program library optimized for the hardware platform, a high-speed search engine, XML-related tools, a Web browser optimized for bioinformatics, and a knowledge service accommodator that integrates all of these components and original data or methodologies together so that bioinformatics processing can be executed to create new value. We have developed and improved these components based on the Post-Genome Platform concept and are currently applying them to practical bioscience research for testing. This paper introduces the Post-Genome Platform and describes some of its components.
NetLaboratory is a net-community for researchers and engineers that is designed to promote the growth and spread of science and technology. NetLaboratory makes various kinds of information available from a Website and provides an infrastructure for business on its network. Much of the information that is used in bioinformatics research and development is supplied via the Internet. However, we face various problems when we try to construct a gene database system for using the supplied information or a Web server system for supplying the information externally. This paper introduces NetLaboratory and describes some examples of the primer design in a gene amplification and SNP catalog database services provided on the Internet through the NetLaboratory infrastructure. In this paper, we show how NetLaboratory's infrastructure enabled us to solve the problems mentioned above through out-sourcing.