Reasoning, Multimodality, and Agents: Three developments every data professional needs to know

Exploring how OpenAI's o1, Advanced Voice Mode, and autonomous systems are reshaping the future of data analysis and decision-making

Jan 09, 2025

Welcome to the first issue of "Subtle Machinery" in 2025. The AI landscape has seen significant developments in the last few weeks, with OpenAI's 12 days of OpenAI introducing groundbreaking models and features, including a preview of their o3 model. Google has also made waves with Gemini 2.0 Flash, advancing multimodal capabilities.

This issue explores three key developments that are reshaping marketing and analytics: reasoning models that enhance problem-solving capabilities, true multimodality that transforms how we interact with AI, and autonomous AI agents that are revolutionizing workflow automation. Let's examine how these innovations will impact our work in 2025.

Reasoning models

OpenAI kicked off their 12 days with the release of their new flagship models o1 and o1-pro. While ChatGPT Plus users had access to an o1-preview, the final o1 model delivers significantly better performance and faster responses. The o1 model family represents a new class of "reasoning models"—the next evolution in LLMs. Unlike previous models that simply used context from prompts and attached documents to craft responses, reasoning models add a chain-of-thought step. This allows them to reason step-by-step and break down complex problems into smaller parts. Through reinforcement learning, o1 can identify and correct its own mistakes while exploring alternative approaches when needed. Alongside o1, OpenAI released o1-pro, available exclusively through a new ChatGPT Pro plan priced at $200 per month.

Google has also entered the arena with their newest model: Gemini 2.0 Flash. Though it's a smaller reasoning model and still experimental, it offers true multimodality (more on that later) and boasts an 8x larger context window (1M for 2.0 Flash vs 128K for o1), enabling comprehensive problem analysis. Given that Flash models have historically been Google's most limited offerings (similar to GPT's mini and turbo versions), the upcoming Pro models in the Gemini 2.0 family show tremendous promise.

Reasoning models have shown their most dramatic performance improvements in mathematics and coding, where problems have "correct" answers. For instance, o1 demonstrated this capability by instantly identifying a mathematical error in a controversial paper claiming black kitchen utensils release toxins during cooking. This rigorous analysis will be invaluable when reviewing assumptions and calculations to ensure decisions and conclusions rest on solid foundations. AI adds an extra layer of confidence, even in highly specialized domains.

Less is known yet about performance in open domains where solutions often depend on complex context and constraints—the kind of challenges faced daily in business settings. While we lack validated data on reasoning models' business performance improvements, I believe they'll be substantial. Though content creation may see just modest gains, fields like marketing analytics should experience significant advances. In my comparison of GPT-4o and o1 responses for designing an incrementality test, o1 demonstrated notably deeper understanding and detail compared to 4o's more generic—though not incorrect—response.

As we step into this new era of reasoning models, early experimentation is key to staying competitive. We'll see major advances in both the models and our understanding of their optimal use. I encourage you to run your own comparisons between reasoning and non-reasoning models—this practical experience will reveal where reasoning models excel (such as complex analysis) and where simpler models remain sufficient (like email drafting or text summarization). The earlier you begin exploring these capabilities, the better equipped you'll be to harness their full potential in your work.

Multimodality

Multimodality has been part of LLMs for some time now. Features like file uploads, image recognition, and voice output are available in both ChatGPT and Google's Gemini. However, these models never felt truly natural. While ChatGPT could generate images, it did so indirectly by creating prompts for DALL-E rather than generating images natively. Similarly, Voice Modes were available, but there's a significant difference between simply reading responses aloud and engaging in natural conversation.

The game-changer came with OpenAI's Advanced Voice Mode, which made ChatGPT interactions feel more natural and enabled features like humor and back-and-forth conversation. Now Advanced Voice Mode also supports video and screen sharing, expanding the AI's world model and allowing for more human-like interactions with ChatGPT.

Google raised the bar with their Multimodal Live API for Gemini 2.0. This allows you to share your video or screen while talking to Gemini simultaneously. The AI can now reason about any document, video, or code file open on your screen. Even in its early version, this feature transforms AI into a genuine pair programmer.

True multimodality brings a human touch to LLM interactions. It feels natural to politely interrupt the LLM during its reply for follow-ups or corrections. Advanced Voice Mode already offers compelling use cases, and with an expanded world model that can "see," the possibilities will only grow. Need to practice delivering difficult news to your team? An AI companion can provide feedback on clarity and help anticipate team members' questions. Preparing for an important presentation? Walk through your slide deck with AI tools that challenge you from different attendees' perspectives. The applications are virtually endless. While reasoning models expand what problems AI can solve, multimodality transforms how we interact with AI.

Agents

Since ChatGPT's release two years ago, developers have focused heavily on optimizing prompts and building AI workflows. This has enabled specialized use cases that connect multiple systems to create powerful automations. Tools like Make.com and Zapier have made these AI workflows accessible to non-technical users. However, while these workflows can be sophisticated, they remain static due to their rule-based structure.

We're now entering a new era where AI systems not only facilitate communication between systems but also make autonomous decisions and recommend optimal workflows. This "agentic AI" approach is embraced by all new LLM models, with reasoning models particularly excelling at planning and breaking down complex tasks. Imagine shopping for a new TV: you simply tell an AI agent your preferences and budget, and it finds suitable options, analyzes prices, and recommends the best choice at the best price. While we're not quite at one-click purchasing yet, many companies are developing these tools with promising early results. Anthropic's Claude, with its Computer Use feature, offers a preview of what's coming—actively browsing websites, comparing information, and compiling research like a human assistant. Though still in its early stages, this capability clearly signals where AI agents are headed in 2025.

For marketing and analytics professionals, agentic AI opens up remarkable possibilities. Imagine an AI that autonomously generates and tests marketing content for maximum engagement. It could create various ad copies, visuals, and calls-to-action for different audience segments, then conduct real-time A/B testing across platforms to identify the best performers. When it discovers that younger audiences engage more with humorous short-form videos while older audiences prefer detailed carousel ads, it automatically adjusts future content. This approach not only enhances campaign performance but also significantly reduces the time and resources spent on manual content creation and testing.

Conclusion

The pace of AI development shows no signs of slowing down as we move through 2025. While reasoning models, multimodality, and agentic AI represent major breakthroughs, they're just the latest wave in an ongoing revolution. Professionals in marketing and analytics who want to stay competitive must continuously adapt and learn, as each month brings new capabilities that can transform how we work with data and derive insights. While it's easy to feel overwhelmed by the pace of change, remember that the key to success lies in practical experimentation. Start small: test a reasoning model's ability to analyze complex problems by comparing its output with traditional LLMs, try using voice and screen sharing for your next dashboard review, or experiment with simple AI agents to automate routine analytics tasks. The goal isn't to revolutionize your entire workflow overnight but to gradually incorporate these tools where they add the most value to your analytics work. Stay curious and keep exploring. The tools we have today are just the beginning, and those who maintain an experimental mindset will be best positioned to leverage tomorrow's innovations. The future of AI-assisted analytics is not just about more powerful models—it's about more natural, intuitive, and productive ways of working together with AI.

Subtle Machinery