The Future of Materials Informatics: Research through Artificial Intelligence
Written by Dale E. Karas
Artificial intelligence (AI) is largely a trending topic at professional conferences and special academic lectures, pervasive in popular culture due to how overwhelmingly its adoption is being pursued, as well as the ethical issues raised about leveraging the implementation of such technologies. With trends to present-day automation overriding menial industrial labor, especially tasks that intend to answer moral questions about eliminating risk and human error, what does its inevitable development mean for humanity?
Near the turn of the century, such questions for AI were abruptly realized with the televised 1997 rematch of IBM’s “Deep Blue” chess supercomputer developers against reigning world champion Garry Kasparov (considered, by many, to be one of the best chess players in existence along with Bobby Fischer and Paul Morphy). A year prior, IBM’s previous version of Deep Blue underperformed and underwhelmed audiences, succumbing to mistakes that the expert human players would effortlessly avoid. The juxtaposition of the 1996 match compared to the incredible upset a year later made the prospects of AI all the more revolutionary—how could a computer, originally prone to a series of comedic errors, now surpass humans based on such a complex series of tasks?
Due to advances in the semiconductor industry to manufacture faster and smaller integrated circuits, as well as fabrication in optics and photonics technologies becoming all the more cost-efficient and amenable to boosting computer processing speeds, memory switching, and data storage capacity, the newer field of computational “neural networks” have gained considerable interest. These are a series of algorithms used for machine learning, acquisition and processing of large data sets, and pattern recognition.
Two decades later succeeding the IBM-Kasparov match, to which Kasparov accused IBM’s team of cheating, as in consulting with external support with the intervention of human grandmasters based on their interest to raise their company’s stock prices with the publicity, the DeepMind team at Google met with comparable controversy with what they posited as their own analogous technological breakthrough. Their “Alpha” platform, using techniques from computational neural networks, mastered the game weiqi (referred to as “Go” in the Americas) with their AlphaGo architecture months prior, and had leveraged the same technology for chess with the AlphaZero incarnation to beat the world’s top open-source Stockfish engine in a match consisting of 100 games. As with IBM’s Deep Blue, AlphaZero was rapidly retired after its victory, ever fueling the critical suggestions of conspiracies at work: the response of notable FIDE title holders (outdated version of Stockfish used, time controls and hardware configuration were suboptimal based on Stockfish’s needs), the more apparent breakthrough was not so much that classical open-source engines were obsolete, but that computational neural networks could be more adept at solving fairly complex problems than had been previously demonstrated.
At this year’s Materials Research Society (MRS) Meeting in Phoenix, Ariz., distinguished computer scientists and materials specialists hosted a special “Materials Informatics” symposium on artificial intelligence development for research prospects. The panel included such speakers as Carla Gomes (Cornell University), Subbarao Kambhampati (Arizona State University), Patrick Riley (Google Research), and Krishna Rajan and Kristofer Reyes (University at Buffalo, State University of New York).
What are some common myths and misunderstandings of AI?
While there are many misconceptions, lots of hype, and a belief that certain novel algorithms and processing routines create the “magic” of AI, it must be noted that many current models we deem as “AI” are happy to learn whatever they are trained—and that means they will learn inconsistencies, and that input errors will scale dramatically. Kambhampati remarked, “Human intelligence [in contrast] is not just learning from data. We lampoon proposals as ‘post-2012’ if they give too much credit to rapid learning processes.” On the subject of how AI would relate to deep learning from neural networks implementation, Gomes stated that “it is dangerous for this community to think everything is going to be solved by deep learning. All of it is merely regression analysis. You would not apply this to many methods of teaching, and this is where researchers need to be flexible with their approaches of implementation.” Lastly, Reyes mentioned “the usual conflation between ‘big data’ and ‘deep learning,’ as they are very separate topics.” The type of statistics used in many of the aforementioned methods are based on Bayesian inference techniques, and while causality can be determined through successive experimentation, much data processing can only determine correlation, which alone cannot infer causation.
What is it going to take to get us from “data acquisition” to “understanding knowledge”? For instance, will AI robots be teaching us in the next century of physics and chemistry?
A good example for this transition was derived from the ideas of data arrangements. Rajan posed the following question: “Could we teach the periodic table without knowing chemistry? People struggle with knowing the set of data; but what are the larger implications, and why is it important?” As Dmitri Mendeleev understood missing gaps, many chemistry Nobel prizes were essentially filling in the gaps in securing missing elements, and this same idea was applied to supporting data processing to help understand knowledge. The AI community is largely aware of this challenge, with major research areas focused on bridging the gap.
How much faster will research in general progress with AI?
The panel deemed that we can largely expect a faster accumulation of data, as is already happening with the biological community. Rajan noted, “The accumulation of data shouldn’t be the goal. If one needs more data, we should instead be concentrating on what the problem is. Can we improve upon connections prior to mass accumulation?” He also lamented that “many experiments are not designed for high throughput. Data awareness is really based on where the data exists, and how does one train people to understand data, rather than just teaching pure methods?” Kambhampati additionally stated: “It is important to remember that it wasn’t that people were not creating data before. It’s just that the data wasn’t being collected! Human beings are making trillions of bytes of data nowadays with mobile technologies and electronic forms of documentation. Data can be captured in many forms, and this has surged the ‘second coming’ of neural networks.” Researchers, of all people, are fortunate that data capture has become an obsessive and effortless practice, so as to help support such a new rulebook.
Some publishers are interested in posting experimental raw data online. Can you comment on opportunities and/or challenges with this?
While there was agreement that there was strong pedagogical value for this effort, Riley commented that “trends to put raw data online is important for experimental repeatability, yet most members of this community believe that is has relatively low value for powering future discovery, especially in potentially preventing novelties. New methods get better all the time, so we should be cautious about the practice of formalizing raw data.” Possibly, raw data accumulation would be supportive for aspects of demonstrating scientific rigor, this would probably not be helpful in the long term and would but consider complexity on what the most crucial data storage needs are.
What kind of data, in particular, is suitable for training neural network? How can materials scientists support AI opportunities?
Rajan noted: “Let’s pick problems that I’ve already worked on it. Can I use these methods to discover something I didn’t know? Data science shows that fundamental physics parameters get masked when there is a plethora of variables.” It was stated that the role of materials scientists should be to contribute what they already know, rather than artificially scrounging for popular techniques from the AI community.
In the coming decades, researchers will thoroughly put these ideas to the test. Establishing an array of formalisms for research methodology will not only be pivotal for developing more unified policies and ethics on the subject of AI, but also helping scientists to understand the nature of posing research questions and establishing a means to measure research success. The panel concluded that the most impending data accelerations may be the encouragement to being more transparent with failures, as in what does not work in research. If there is a way of capturing all false steps and good steps, that will help many long-term goals for deep learning. Perhaps even “pseudo” experiments to find comparable outputs given other inputs would be of enormous value, for a publishing set bereft of page limits and critical publisher reviews. We may also accept the lack of proprietary technical developments, such as with the game theory data engines of IBM and Google, once such a formalism, in supporting best research practices for such new criteria, is thoroughly understood for both risk and value.