According to the study, the AI systems were able to interpret the correct clock-hand positions less than a quarter of the time
In a new study, researchers from the University of Edinburgh have highlighted a significant shortcoming in the abilities of some of the world’s leading artificial intelligence systems: they struggle with basic timekeeping tasks, such as reading analogue clocks and understanding calendars. Despite their remarkable performance in complex reasoning tasks, these AI models are found to falter when it comes to interpreting clock-hand positions and answering questions about calendar dates.
The research, conducted by a team from Edinburgh’s School of Informatics, aimed to assess how well multimodal large language models (MLLMs) can answer time-related questions. The tests involved presenting these AI systems with various clock designs, including those with Roman numerals, different coloured dials, and even clocks without second hands. In all scenarios, the results were strikingly poor.
According to the study, the AI systems were able to interpret the correct clock-hand positions less than a quarter of the time. This is a concerning finding, as reading an analogue clock is a basic skill that most humans acquire at an early age. The research also pointed out that the performance of AI systems worsened when Roman numerals or stylised clock hands were involved. The absence of a second hand did not improve the models’ ability to detect the time, indicating deeper issues with their ability to understand hand positions and angles.
Rohit Saxena, one of the lead researchers from the School of Informatics, explained the significance of the study’s findings: “Most people can tell the time and use calendars from an early age. Our findings highlight a significant gap in the ability of AI to carry out what are quite basic skills for people. These shortfalls must be addressed if AI systems are to be successfully integrated into time-sensitive, real-world applications, such as scheduling, automation, and assistive technologies.”
But the challenges didn’t end with clocks. The AI systems were also tested on calendar-based tasks, such as identifying holidays, calculating dates in the past, and predicting future ones. Even the best-performing model got date calculations wrong one-fifth of the time. This suggests that the combination of spatial awareness, contextual understanding, and basic arithmetic required to interpret and calculate with time is still a weakness for current AI models.
The research team’s findings bring attention to a critical issue: while AI systems have made tremendous progress in areas like natural language processing and complex problem-solving, they remain woefully inadequate at performing simple, everyday tasks that people routinely navigate. This gap poses significant challenges for integrating AI into real-world applications that require precision in time-related matters, such as scheduling meetings, setting reminders, and automating tasks in systems that depend on accurate timekeeping.
The study is a timely reminder that while AI is advancing rapidly, it still has fundamental limitations. “AI research today often emphasizes complex reasoning tasks, but ironically, many systems still struggle when it comes to simpler, everyday tasks,” said Aryo Gema, another researcher from Edinburgh’s School of Informatics. “Our findings suggest it’s high time we addressed these fundamental gaps. Otherwise, integrating AI into real-world, time-sensitive applications might remain stuck at the eleventh hour.”
The research also highlights an issue that has been largely overlooked in the race to develop more advanced AI systems: the importance of basic spatial awareness and contextual understanding. For a machine to accurately read a clock or calendar, it must be able to understand the position of the hands on a clock face, recognize the passage of time, and calculate dates with a certain level of accuracy. These tasks require a combination of spatial reasoning and mathematical computation, which current AI models are still struggling to master.
The findings underscore the need for a shift in focus for AI developers. While more complex tasks, such as answering open-ended questions or solving intricate problems, are valuable pursuits, it is equally important to ensure that AI can perform basic tasks with a high degree of reliability. This is especially crucial when it comes to time-sensitive applications in areas such as healthcare, autonomous systems, and customer service, where even small errors in time calculations can have serious consequences.
The study was conducted using several state-of-the-art AI models, which were evaluated using images of clocks and calendars. The models were tasked with identifying the correct times, understanding the relationships between dates, and solving simple time-based math problems. The results showed that even the most advanced AI models often fell short of human-level accuracy, particularly when dealing with more complex clock designs and calendar questions.
As the study points out, this shortcoming is not just a theoretical issue but a practical one that could limit the potential of AI in real-world applications. Time-sensitive tasks, from setting alarms to coordinating schedules, are an integral part of daily life, and AI must be able to perform these functions reliably if it is to become an effective tool for businesses and individuals alike.
The research will be presented at the upcoming Reasoning and Planning for Large Language Models workshop at the Thirteenth International Conference on Learning Representations (ICLR) in Singapore on April 28, 2025. The hope is that by bringing attention to these fundamental gaps in AI’s capabilities, researchers and developers will be encouraged to focus more on solving these issues before AI can be fully integrated into time-critical systems.
While AI has made impressive strides in many areas, it is clear that much work remains to be done in mastering even the most basic skills. If AI is to play a meaningful role in time-sensitive applications, addressing these fundamental weaknesses will be essential for its success in the real world.