Computational Biology: Using Computers to Solve Complex Biological Problems
Doctor Goswami joined the Division of Agriculture last year after working in human focused medical research at Yale and Stanford Universities. His work is boosting research for the poultry science and animal science departments, as well as the Experiment Station Center for Agricultural Data Analytics.
Doctor Goswami, how do you explain bioinformatics to someone who has never heard of it?
Aranyak Goswami: Bioinformatics is, in a very layman terms, using computer science to solve biological problems. And these biological problems can be therapeutic things, like giving rise to better therapeutic strategies with the help of computational drug targets that you give. If you know about the Human Genome Project, which happened in the early 2000s, that was actually the birth of modern bioinformatics, because we have these sequences of genes and—to put it in a very layman’s terms—we get hundreds and thousands of genes from humans. Actually, 40,000 of the coding genes, to be precise. And that sequence was done with the help of computational approaches. And that gave rise to this modern field of bioinformatics.
Now, with the genetic tools in our hand, nowadays we can extrapolate this not only to humans, but to any kind of modern system we want to do. We can also say that there are a lot of agricultural and plant sciences, animal sciences, animal models. You can extrapolate this. And if I put it in a very non-sciencey way, then it will be using computer science knowledge to solve biological problems and also, with the advent of AI, doing a lot of predictive analysis.
JL: When we talked about it before, I kind of connected it to putting together a puzzle with millions of pieces. Do you put together puzzles in your free time?
AG: What I like to do–I like to put puzzles or Lego kind of things when I was a kid. What I really liked, to keep me sharp, is I played chess a lot. Chess puzzles I solve. Bioinformatics is a lot about pattern discovery, pattern finding, pattern matching. And solving chess puzzles in a kind of limited time gives you that mental agility. So, you can identify patterns, go for these sequences. So I was always an avid chess player. I just do this one or two rounds every day to keep me mentally agile. And that is very close to puzzle solving, like finding patterns and the best possible move within the shortest possible time. And then I sometime take the help of computers to analyze whether my move was the best or something.
JL: When people hear about gene sequencing is that people like you or a combination of different disciplines?
AG: Yes. So when we talk about gene sequencing, there are two aspects of it. So the one aspect is, first of all, the experimental aspect. So if I just explain it in very simple terms, everybody knows about DNA, which is the basic genetic material of the cell. So experimental biologists basically extract the whole DNA from an individual. And this DNA, if you see, actually is a bearer of all the genes that we have in our hand.
So there are different techniques, experimental techniques, that people use to make this experimental genetic profiling. Now, our work as a bioinformatician comes as next. So when you have this different portions of genes present in different regions of the chromosome, our job is kind of piecing these things together, in a computational way, so that you can get the complete information.
Now, having said that, that was the early days of bioinformatics. Now we have moved a lot forward. So the initial goal was to map the genes. But now what we can do is that we can piece together information, also find out relationships between the genes, and something we used to call “dark matter,” which is actually the non-coding elements of the gene.
And recent research has shown that these non-coding elements are more interesting than the coding elements. Now, therefore, with the help of these computational tools, we can analyze and find relation between the gene regions and the interaction with the non-genetic or non-coding regions. That gives us a lot of perspective about how regulation of a gene happens, which means that whatever phenotype that we see, a particular behavior, is not only only genetic, but there are also what is called phenotypic data, which is not just transmitted from one individual to the next in the usual hereditary manner, but based upon the lifestyle things you are taking.
For example, the chromosome can act like methylation patterns which are specific to certain individuals, which gets us to the whole field of epigenetics. And this is something that genomics aspect, which I am an expert of, I am talking about, and now the advent of machine learning, there is a lot of predictive things you can do. If you have a list of certain parameters, it can be genes or can be any kind of information. So you can just put it in a computer, like I’m just putting in a very laymen term. And then you can, based upon that, you can identify some signatures of the data, which are the predictive markers. And then you get new kinds of similar data. You can see that this data will be following that predictive behavior or not.
And this has a great application. So not only in biology field but if you see whatever when you are typing your prompt on Apple, this comes from predictive text. And now it has become AI-based. So all of ChatGPT and something you see nowadays is based upon this prompts as you see. And that’s always because you are doing certain kinds of predictions based on this. This is a word-based prediction. Like you have one word and you have to predict what the next word would be probabilistic manner. That’s all ChatGPT is doing to be put in a very, very laymen term.
JL: So can you use it to predict a football game?
AG: Yes. So the football game context would be interesting because—I will give a very interesting concept. There is a very good movie called Moneyball. And that Moneyball, it was about baseball. And I really liked that when they launched the movie. So Billie Jean [Beane] or somebody, who was the name of the coach, actually took a whole lot of data collection from people.
So ideally the baseball people or the football people usually go for people who has got a whole set of skills. Yeah, like baseball is a good pitching and good throwing, but that person individual I think for Boston Sox or something, he collected just those data and he formed the team with just this data analytics making the prediction that you don’t need a complete perspective, you need some good throwers and good pitchers. And trust me, in that movie, it is shown that based upon this thing where he won 22 games in a row following this principle, and it’s a record. None of the traditional sports coaches had a belief in him, yeah, because he started with some losses. But then he had a phenomenal success rate with that. And that’s all about making predictions with the data that he has having in hand.
And I would recommend our audience, if you have not watched this movie, go and watch it. It’s very interesting.
JL: In your presentation at the AI in Ag Symposium, you talked about three projects that benefit Arkansas agriculture. Can you summarize those briefly?
AG: The first project I am doing with the ENPL Department here, Entomology and Plant Pathology, and with the Computer Science department. Doctor Fiona Goggin from ENPL and Khoa Luu from Computer Science. Very competent scientists. That particular project we are looking into a plant which is called arabidopsis thaliana. Agricultural people will know that, and we are trying to correlate something called the genomic data and the phenomnic data.
So what is genomic data? It is basically all the genetic data that we have, the information about the genes. But for the plants we also collect other kinds of data like we take several measurements, we take photographs of suppose the leaf size, what kind of crop infestation has happened with pesticides, and what actually changes in leaf diameter and this kind of thing. To our understanding, we have genetic data before.
We have also this kind of phenomics data. But there is no integration of this genomic and phenomic data together. And we are for the first time trying to do of integrating this genomic and phenomic data so we can get a complete perspective and not only have good biological context. But also we will be able to make better set of precision plants in future. So that’s one of the projects.
The second one is working with swine genetics because I’m part of the Animal Science Department. We have a lot of swine population. What we wanted to look about the microbiome of the swine. And for people, if you do not know microbiome, it is the beneficial microbes which is present in a particular animal or a human it can be. And we are wanting to look at what is the growth of that microbiome within the swine population or something at a certain time period. And we are looking at the intestinal profiling because, although we have the similar ages of swine with us, but the microbial health and microbial maturity across the same age group of swine may vary depending upon their microbial pool.
So we are trying to address and analyze this microbial pool. And at the same time, what we are trying to do is also to develop machine learning models that if these are the good sets of bacteria that we have, then based upon this as a test set, what are the good amounts of bacteria that can be beneficial for that particular swine population?
Another project that is actually directly under my lab, and I will mention just very briefly about that. People know that in poultry industry specifically here, we have recently saw a great surge in price of eggs and everything, because poultry went to a kind of an epidemic and it cost billions of dollars of loss. So we are trying to study a particular pathogen called enterococcus faecium, which infects the poultry pathogen.
We are using computational methods to find out what are the specific genes which makes that chicken sick because of that presenting bacteria. And this work we are doing with called vendors, who is one of the biggest chicken poultry company over here. They are our stakeholders. We are trying to do this analysis to give them some candidates that they will be using as targets for non-vaccine-based antibiotics.
JL: You recently published an article about ChatGPT 5 and how the new AI era is changing science. Can you talk a little bit about that? And you mentioned there are some philosophical and scientific questions that come with it.
AG: We can make major scientific advances. This is my take over civilization. Like over the next hundred years, we can go to Mars or cars that can fly in the air and solve major of our diseases. But the basic proponents of humans, like people envying each other, people wanting bad things for each other, these are human tendencies that have persisted for generations.
So there is always an ethical perspective that we should take. And, if you know, the whole field of deep learning and machine learning was brought about by Geoffrey Hinton, who basically started this machine learning, deep learning thing. And he’s a little bit skeptic about the way AI is progressing. that it might control us. If AI becomes AGI, which is artificial general intelligence, AI systems might communicate with each other, and it is no longer under the human’s control.
Having said that, with respect to my article that was published in a very leading journal back in India, what I would like to say as a summary, that whatever people may understand with fancy images or good prompts they’re getting, right now the AI model is nothing more than just a predictive model. What is happening, basically, is a probability. You are giving them one word, and they are trying to find out the next possible word. AI is not new. It is well known with different names like machine learning, machine intelligence, and this is part of statistical learning. And it has progressed for decades. But right now with the advent of very powerful computing machines, we are not making those mistakes, so we are getting meaningful interpretations.
So, that is why they are still not thinking in the way that we think. But the perils about it, that people who are far more experienced than me and some of the messiahs in the field, they say AI is not like our brain. We are far more sophisticated. But they sometimes do a brain-inspired design that can perform tasks that can overpower us. So there should be a word of caution regarding that. Otherwise, that AI brings a lot of good things as well, and we should exploit that. But we also see that the human value should not be gone.
JL: Well thank you, Doctor Goswami, for coming in today. We really appreciate your insights on AI and appreciate you coming to work for the University of Arkansas.
AG: And thanks, everybody, John and the whole team, who is helping me to put these things because I, as a computational biologist, really want to take University of Arkansas to a global map in the field of AI. I want the university, people from outside, to know that we can do very good machine learning and AI with respect to agriculture that benefits this community. And I’m very proud to do that. I am very excited to build the next generation of students who will do good AI-based work, genomics-based works, on agricultural problems.
JL: Thank you very much.
Short talks from the Hill is available wherever you get your podcasts. For more information and additional podcasts, visit Arkansas Research.uark.edu, the home of science and research news at the University of Arkansas. Music for short Talks from the Hill was written and performed by local musician Ben Harris.


