The ABCs of AI in the Natural World
Five Arts and Sciences faculty members explored the impact of AI on a wide range of scientific research during the Columbia AI Summit.
A diverse group of Arts and Sciences faculty gathered on Tuesday, March 4th, as part of the Columbia AI Summit to discuss how artificial intelligence tools are enabling new approaches and discoveries across a range of fields in the natural sciences.
Titled “The ABCs of AI in the Natural World: From Animals to Batteries to Climate,” the discussion – which featured faculty from four A&S departments – Biological Sciences; Chemistry; Ecology, Evolution, and Environmental Biology (E3B); and Statistics – shared their views on the benefits, and potential pitfalls, of using AI to advance scientific research. The session was moderated by Provost Angela Olinto, herself an accomplished scientist and faculty member in the Department of Astronomy and Department of Physics.
The panelists provided a brief overview of their research and highlighted several active projects where AI has helped support their work.
Unlocking New Discoveries
For Tian Zheng, Professor and Chair of the Department of Statistics, there are “two sides to the AI coin.” One side is the pursuit of research using AI tools such as large language models to better understand complex data. The other side – which is her focus and that of statisticians – is ensuring that AI tools are developed properly for the task, especially when the data might not have been collected for the purpose it is being used. “If AI is the car,” she notes, statisticians “make the parts for the car.”
She highlighted three projects where she and her Columbia colleagues are leveraging AI tools to support new interdisciplinary research in the natural sciences: a collaboration with the E3B department to study how rainforests react to extreme weather; a project through the LEAP Center to develop more accurate climate models, including for areas prone to urban flooding; and a project to better identify and reach populations at high risk for opioid abuse and addition.
Richard Friesner, Professor in the Department of Chemistry, discussed how AI and machine learning tools have advanced his research related to lithium-ion batteries, which power essential technology like cell phones, electric vehicles, and power grids, which often use batteries to store and release energy. His research seeks to understand the complex chemical reactions inside batteries in the hope of developing a more efficient, longer lasting, and less expensive power source.
“It is hard to study what goes on inside batteries experimentally,” he notes, because of the complexity of the chemical reactions inside the closed system. Instead, he is using AI and machine learning tools to produce a simulation of those reactions – something, he says, “chemists haven’t been able to do for 50 years.” Using AI and machine learning allows him to “solve equations a million times faster and more efficiently and reliable that we’ve been able to do in the past.”
Dustin Rubenstein, a professor in the Department of Ecology, Evolution, and Environmental Biology (E3B) who studies how environmental variation shapes the genome and phenotype of animals, discussed how AI tools are enabling researchers to identify new patterns of behavior based on the analysis of large sets of observational data. The data comes from a variety of sources, including cameras in the field, crowdsourced photos from cell phones, and even historic images in museum collections.
“The ability to get genomic data has become easy,” he notes, because of advances in the sequencing of genes. “But the ability to get phenotype data” – such as patterns of color on butterflies – “hasn’t caught up.” He and other researchers are using AI tools to help sift through the data, including still and video images, remote sensor data, and even satellite imagery, to gain a more complete picture not only at the species level, but for individual animals.
“Zebras were among the first for species identification because they have a built-in bar code,” he shared. By tracking individual animals using data and AI tools, scientists can improve their ability to make predictions about how individuals change their behavior, and the role environmental factors play in evolutionary changes over extended periods.
Harmen Bussemaker, a professor in the Department of Biological Sciences, also studies genomic data, but through a different lens. He and his research team look at how proteins read instructions on sequences of DNA to determine which genes should be turned on or off, which influences how organisms develop and respond to internal and external conditions. He is using AI tools to train models to predict different properties based on those DNA sequences.
“There are so many types of molecules that it poses a technical problem because the data are so sparse,” he notes. “We throw a mix of millions of different molecules against the wall like spaghetti.” AI helps connect the dots so that he can make accurate predictions about individual molecules and their properties.
Notes of Caution
Even with these advances, however, the panelists added a note of caution about AI. Data can be messy, incomplete, or biased. It’s not always clear how data sets have been generated. Models can be wrong. And researchers don’t always understand the problems they are trying to solve.
“Is the model extracting the right information?” Tian Zheng warns. “It is an iterative process. The trend in AI is to pile up data but we must rigorously evaluate it to make sure the predictions are good.”
“You must also do experiments,” adds Richard Friesner. “AI models only make approximations, and you have to make sure the AI models are accurate.”
Provost Angela Olinto asked panelists for their “worst nightmare” scenario in the use of AI, and how to manage that challenge. The panelists offered several suggestions.
“It’s important to have long term collaborations with experimental labs,” noted Harmen Bussemaker. “You need to learn how data is generated. It’s an iterative cycle. And experiments don’t always work the first time.”
Dustin Rubenstein noted the pressure scientists face to move quickly given the rapid evolution of AI. “We’ve seen all the money U.S. companies have put into large language models and DeepSeek comes along and does it for less,” he said. “The field is moving so fast, and it takes a long time to do some of the modeling. If you’re not moving fast, you get left behind.”
Faith in AI models is also a recurring concern, particularly when funding comes into the picture. “The scary part,” Richard Friesner notes, “is when someone commits a lot of resources when they believe your prediction. At some point, if the theory is successful people start to take it seriously. You have to find out whether you are fooling yourself or you have something real.”
Tian Zheng agreed. “The model becomes so complicated, you get nervous about the result being too good to be true.”
What about the next generation of scientists: the students in Columbia’s classrooms and science labs? How should they prepare for a world saturated with AI?
For Harmen Bussemaker, it starts with fundamentals, like learning exploratory data analysis. “What kind of plot do you make? How do you get a P-value? It’s very elementary, but important to have a foundation of understanding.” He worries students could lose their connection to the underlying knowledge.
Collaboration across fields is also important, notes Dustin Rubenstein. “Students need to learn how to work on a team in addition to individual work. How do we build a more collaborative environment for science?”
Curricular changes may also be necessary. “Some courses haven’t changed in 35 years because some things need to be learned,” notes Richard Friesner. “But the impact of machine learning is so large that some fundamental changes are needed.” He suggests introducing machine learning concepts earlier and making clear that whether you are a theorist or an experimentalist, “You need to understand how AI and machine learning are being used and impacting fields.”
This is especially true for undergraduates, Tian Zheng notes. “We are trying to introduce low-barrier activity into intro courses,” to invite undergraduates to think about research and expose them to scientific discovery using AI tools.
Despite the challenges and limitations posed by AI, the panelists all agreed that we are in a unique and exciting moment in the natural sciences. AI is creating new opportunities to ask different kinds of research questions, find solutions faster and across a broader range of disciplines, and prepare students at all levels for a new frontier of scientific learning and discovery.