Overview
Artificial intelligence (AI) is the ability of computers to perform functions associated with the human brain, including perceiving, reasoning, learning, interacting, problem solving, and exercising creativity. AI promises to be a fundamental enabler of technological advancement and progress in many fields, arguably as important as electricity or the internet. In 2024, the Nobel Prizes for Physics and Chemistry were awarded for work intimately related to AI.
Three of the most important subfields of AI are computer vision, machine learning, and natural language processing. The boundaries between them are often fluid.
Computer vision (CV) enables machines to recognize and understand visual information, convert pictures and videos into data, and make decisions based on the results.
Machine learning (ML) enables computers to perform tasks without explicit instructions, often by generalizing from patterns in data. ML includes deep learning that relies on multilayered artificial neural networks to model and understand complex relationships within data.
Natural language processing (NLP) equips machines with capabilities to understand, interpret, and produce spoken words and written texts.
Although AI draws on other subfi elds, it is mostly based on machine learning (ML), which requires data and computing power, often on an enormous scale. Data can take various forms, including text, images, videos, sensor readings, and more. The quality and quantity of data play a crucial role in determining the performance and capabilities of AI models. Models may generate inaccurate or biased outcomes, especially in the absence of suffi cient high-quality data. Furthermore, the hardware costs of training leading AI models are substantial. Currently, only a select number of large US companies have the resources to build cutting-edge models from scratch.
Key Developments
Dominating the AI conversation in 2024 were foundation models, which are large-scale systems trained on very large volumes of diverse data. Such training endows them with broad capabilities, and they can apply knowledge learned in one context to a different context, making them more flexible and efficient than traditional task-specific models.
Large language models (LLMs) are the most familiar type of foundation model and are trained on very large amounts of text. LLMs are an example of generative AI, which can produce new material based on its training and the inputs it is given using statistical prediction about what other words are likely to be found immediately after the occurrence of certain words. These models generate linguistic output surprisingly similar to that of humans across a wide range of subjects, including computer code, poetry, legal case summaries, and medical advice. Specialized foundation models have also been developed in other modalities such as audio, video, and images.