At the end of 2016, as a graduate student, I came across an article on social networking sites about using deep learning to generate illustrations. Although the quality was much lower than today's generative AI, I was still amazed at how accurate the generated images were, so much so that I could recognize them as human faces, and what's more, they were generated from random noise. Thanks to this article, I learned that it was based on a technology called adversarial generative network (GAN), which is capable of generating and transforming images with high accuracy, and by investigating this technology, I came across the whole fascinating subject of machine learning. From that point on, despite being an electronics engineering major, I studied the theory of machine learning and the mathematics behind it, read all the article about it in depth, started my own programming and built my own version of the GPU environment to reproduce the generative AI described in the article. I was hooked and every time an interesting paper such as CycleGAN (*1) was published, I spent my days reproducing and implementing them. At the same time, I started job hunting and began to focus on my desire to become involved in machine learning-based development.
The day a machine learning article caught my attention
Aim is to fully automate fossil appraisal
After joining Fujitsu, I was assigned to the AI Service Division and participated in a project (*2) on stratigraphic dating by automatic nannofossil detection, which was conducted in collaboration with INPEX, JOGMEC, and Akita University. I learned two important things from this project. First, it is necessary to have a deep understanding of the subject and to acquire knowledge. The second is that acquiring deep knowledge requires close communication with the client.
Nannofossils, the target of the detection, are so difficult to appraise that they require experts, and without an understanding of the key points to distinguish them, it is impossible to develop highly accurate AI models. We deepened our understanding of nannofossils through regular communications with the experts. The ability to ask questions directly, and sometimes to observe nannofossils with microscopes in the field were key to developing our understanding. In the beginning, the AI model was not very accurate, and we had to devise new data input methods and adjust parameters through a process of trial and error. Eventually, we were able to create a highly accurate AI model to a level where we could make an approximate appraisal by ourselves.
In another project I was involved in later, we proposed a solution using deep learning to a customer and received a high evaluation. Through these experiences, I learned about the development of deep learning models, as well as software development methods, how to proceed with machine learning projects, and how to communicate effectively with customers and conduct business. Considering my future career, I was not sure which path to take. My options were to take a system engineer position closer to the business scene or to opt for a research position to pursue technical expertise. In the end, I decided to return to my research roots, deepening my research in machine learning and as a result I transferred to Fujitsu Research through the in-house posting system.
A red letter day - on route to becoming a researcher
Since moving to Fujitsu Research, I have conducted joint research with two universities in particular. With the first university, we conducted joint research on scene graph. A scene graph is a data structure that expresses the relationships between objects in an image. It is a graph in which objects are nodes and relationships between objects are edges, and its characteristic feature is that it predicts relationships. The issue was how to acquire the relationships between objects as a representation, and we proposed a method to do so. I was delighted when the results of this research were published in a paper (*3) as my first major work and it was accepted by the international conference WACV2023 (*4). I felt that I had finally taken my first big step as a researcher.
In the second joint research project with the University of Toronto, we worked on the development of technology to extract the necessary information from complex data and then to implement AI according to the stated purpose. In this research, we had regular meetings with Dr. Jimmy Ba of the University of Toronto. Dr. Jimmy Ba is the author of one of the optimization techniques, Adam (*5), and I always looked forward to his solid advice backed by his numerous achievements and forward-looking ideas. The result of our joint research, "Multi-Rate VAE: Train Once, Get the Full Rate-Distortion Curve" was accepted (*6) to ICLR2023, the top conference in the field of AI and machine learning, and I believe that this achievement may have contributed, even if just a little, to enhancing the presence of Fujitsu and the University of Toronto.

Composing music plays an important role
I have taken piano lessons since childhood and have played a lot of classical and game music. Music took root in me and practice became an important part of my daily routine. During my university years, I became interested in composing music and used specialized software to write songs and sometimes add vocals. When composing, I first listened to and analyzed various genres of music, and then I added my own color to the music while referring to other people's music. I believe that my whole mindset towards pursuing specialty subjects was cultivated through these experiences. In R&D, I first read a large number of related papers to determine a research theme. Then, I proceed with the process of implementing my own ideas by solidifying the experimental foundation through the reproduction and implementation of the contents of the papers.
Towards AutoML and further research
I am currently working on R&D of AutoML (*7) at Fujitsu Research of America (FRA). One of my research goals is to create the ultimate foundation model for extracting meaningful features and patterns from various types of data, such as text, images, and audio, that can be applied to any task. With my transfer to an overseas research unit, my work and living environment has changed drastically. Although I am still adjusting to the unfamiliar environment, my goal is also to find new research themes that are not confined to just AutoML and to go on and produce results. This is because the excellent researchers I know are those who can derive excellent research themes on their own, not just on a given topic. In order to be as close as possible to such people, I have decided to come to the U.S. to engage in friendly competition with a diverse range of people, and I am advancing my research on a daily basis.

I am interested in the application of machine learning to other fields from an interdisciplinary perspective, and there was a period during my graduate school days when I considered the possibility of using machine learning to search for device structures. At the time, I still had only a shallow knowledge of machine learning and was conducting misguided experiments, but I believe I can take a different approach now. On the other hand, the groundbreaking work of AlphaFold, which accurately predicts protein structures, was particularly impactful. I am amazed at the fact that the paper was also well validated for its practicality and is now being used in many situations. While I am currently in the stage of studying various fields, I aspire to contribute to the world by producing research results that, with diverse perspectives, can help people around the globe lead happier lives in the future.

Artificial Intelligence Laboratory
Graduate School of Electronic Engineering
Joined Fujitsu in 2018
Life is like a box of chocolates. You never know what you're gonna get until you open it up.
Editor's note
In recent years, AI technologies have developed so rapidly that it is not unusual for a technology that was mainstream yesterday to be a tributary tomorrow. For example, when So first started working on machine learning, GAN was the go-to for image generation, but now it has been replaced by diffusion models and others. "However, I don't think GAN has become obsolete. The quality of generated images is just one of several evaluation metrics for generative models. I want to choose and pursue the technology that aligns with the task I want to solve, without being swayed by trends," he explains. Additionally, at the end of the interview, So smiles and says, "I am grateful to my spouse, who is intelligent and dynamic. Like her, who sometimes takes risks to expand her options and future possibilities, I also want to continue challenging without limiting myself too," It was a wonderful story to hear.
-
(*1)CycleGAN refers to a method that achieves image transformation by learning the domain (field or region) relationship between two image data sets.
-
(*2)
-
(*3)
-
(*4)WACV (Winter Conference on Applications of Computer Vision) is an international conference on the field of computer vision.
-
(*5)Adam is the most widely used learning technique in deep learning.
-
(*6)
-
(*7)
- Notes(*2) only available in Japanese