Agents Are Here

Breaking the data bottleneck: Salesforce’s ProVision speeds multimodal AI training with image scene graphs

VentureBeat 1/12/2025

AI Enhanced Version

As global businesses increase their focus on artificial intelligence (AI) projects, the need for high-quality training data has become a significant hurdle. While major tech giants have managed to secure exclusive partnerships to expand their proprietary datasets, this leaves limited access for smaller enterprises. In response to this growing issue, Salesforce has launched ProVision, an innovative framework that generates visual instruction data. This system produces datasets that can be used to train high-performance multimodal language models (MLMs) that can answer questions about images. The ProVision-10M dataset, created using this method, is already being used to enhance the performance and accuracy of various multimodal AI models. For data professionals, ProVision signifies a major advancement. It reduces reliance on limited or inconsistently labeled datasets, a common issue in training multimodal systems. It also offers better control, scalability, and consistency, speeding up iteration cycles and lowering the cost of acquiring domain-specific data. This innovation complements ongoing research in synthetic data generation, such as Nvidia’s Cosmos, which generates physics-based videos for AI training. Instruction datasets are vital for AI pre-training or fine-tuning. These specialized datasets help models effectively respond to specific instructions or queries. In the case of multimodal AI, the models gain the ability to analyze content such as images after learning from a variety of data points, along with question-answer pairs that describe them. However, producing these visual instruction datasets is a complex process. Manual creation for each training image is time-consuming and resource-intensive. Using proprietary language models can lead to high computational costs and potential inaccuracies in the question-answer pairs. It also lacks transparency, making it hard to understand the data generation process or customize outputs. Salesforce's AI research team developed ProVision to address these issues. It uses scene graphs and human-written programs to systematically synthesize vision-centric instruction data. Scene graphs are structured representations of image semantics, with objects represented as nodes and their attributes assigned to these nodes. Relationships between objects are depicted as directed edges connecting the nodes. Once the scene graphs are prepared, they power data generators that create question-and-answer pairs for AI training pipelines. Salesforce used both manual annotation and generation from scratch to set up scene graphs, resulting in 24 single-image data generators and 14 multi-image generators. The ProVision-10M dataset, composed of more than 10 million unique instruction data points, is now available on Hugging Face and is already proving effective in AI training pipelines. When incorporated into multimodal AI fine-tuning, it has shown significant improvements in model performance. While there are several tools for generating different data modalities for multimodal AI training, few address the issue of creating the paired instruction datasets. Salesforce's ProVision tackles this bottleneck, providing a method to generate instruction data programmatically. This approach ensures interpretability and controllability of the generation process, scales efficiently, and maintains factual accuracy. In the future, Salesforce hopes researchers can build on this work to improve the scene graph generation pipelines and create more data generators for new types of instruction data, such as those for videos. This development in generative AI is essential for companies seeking to maximize their return on investment in AI projects.

Researchers improved AI agent performance on unfamiliar tasks using ‘Dungeons and Dragons’

VentureBeat 1/12/2025

AI Enhanced Version

For organizations keen on integrating AI agents, the preliminary step involves fine-tuning these agents – a process that can often feel repetitive. While some organizations prefer agents that perform a single task within a specific workflow, there are instances where these agents need to be introduced to new environments and expected to adapt accordingly. A team of researchers from a leading Chinese university has introduced a novel method, dubbed AgentRefine, that trains agents to self-correct, resulting in AI agents that can adapt and generalize more effectively. The researchers noted that existing tuning methods restrict agents to tasks identical to their training data, also known as "held-in" tasks. Consequently, these agents do not perform as well in new, or "held-out" environments. Agents trained with these methods struggle to learn from their mistakes, hindering their ability to become generalized agents that can be integrated into new workflows. To overcome this limitation, AgentRefine is designed to generate more generalized training data for agents. This data enables the model to learn from its mistakes and adapt to new workflows. According to the researchers, the objective of AgentRefine is to develop data that promotes agent generalization and self-refinement. The ability of agents to self-correct prevents them from repeating learned errors when deployed in new environments. The researchers found that using self-refinement data for agent tuning encouraged the agent to explore a broader range of actions in challenging situations, resulting in improved adaptability to new environments. Drawing inspiration from the popular tabletop role-playing game Dungeons & Dragons, the researchers designed personas, scripts, and challenges for the agent. They segmented the data construction for AgentRefine into three sections: script generation, trajectory generation, and verification. During script generation, the model creates a guide that includes information about the environment, tasks, and potential actions for personas. The model then generates agent data that includes errors, acting as both a player and a game master during the trajectory stage. It evaluates possible actions and identifies any errors. The final stage, verification, involves reviewing the script and trajectory, providing the opportunity for trained agents to self-correct. The researchers discovered that agents trained using the AgentRefine method and dataset performed better on a variety of tasks and adapted more effectively to new scenarios. These agents were better at self-correcting, enabling them to adjust their actions and decision-making process to avoid errors, thus increasing their robustness. Specifically, AgentRefine enhanced the performance of all the tested models on held-out tasks. For enterprises, it's crucial to ensure that their AI agents are adaptable to different tasks and don't merely repeat what they've learned. This adaptability allows them to become better at decision-making. Orchestrating agents involves not only directing multiple agents but also verifying if they have completed tasks based on user requests. Tools like OpenAI's o3, which offers "program synthesis", could enhance task adaptability. Other orchestration and training frameworks, such as Microsoft's Magentic-One, outline actions for supervisor agents to learn when to delegate tasks to other agents. For those looking to stay ahead of the curve in the field of generative AI, VB Daily provides a wealth of information. From regulatory changes to practical applications, we offer insights that can help maximize your return on investment.

Google DeepMind researchers introduce new benchmark to improve LLM factuality, reduce hallucinations

VentureBeat 1/12/2025

AI Enhanced Version

Large language models (LLMs) have been facing ongoing issues with generating factually accurate responses, especially when tasked with complex queries. This problem is particularly prevalent when users require detailed and specific answers. To tackle this issue, researchers from a leading tech company have developed FACTS Grounding, a benchmarking system that evaluates the ability of LLMs to produce factually accurate responses based on comprehensive documents. The benchmark also assesses whether the responses are sufficiently detailed to provide relevant and useful answers. In addition to the benchmark, the researchers have launched a FACTS leaderboard on the Kaggle data science platform. The leaderboard, which will be regularly updated, currently shows Gemini 2.0 Flash in the top spot with a factuality score of 83.6%. Other notable models ranking above 61.7% in accuracy include Google’s Gemini 1.0 Flash and Gemini 1.5 Pro, Anthropic’s Clade 3.5 Sonnet and Claude 3.5 Haiku, and OpenAI’s GPT-4o, 4o-mini, o1-mini and o1-preview. The researchers believe that the FACTS benchmark fills a gap in evaluating a broad range of model behaviors related to factuality, compared to other benchmarks that focus on more specific use cases, like summarization. Achieving factual accuracy in LLM responses is a complex task due to various modeling and measuring factors. Traditional pre-training methods focus on predicting the next token given previous tokens. However, this approach does not directly optimize the model towards various factuality scenarios, instead, it encourages the model to generate generally plausible text. To address this, the FACTS dataset includes 1,719 examples, each requiring detailed responses based on the context of provided documents. Each response must be comprehensive and fully attributable to the document to be considered accurate. Inaccurate responses are those where the model’s claims are not supported by the document and are not highly relevant or useful. In the evaluation process, each example is judged in two phases. First, responses are evaluated for eligibility, and if they don’t satisfy user requests, they are disqualified. Second, responses must be hallucination-free and fully grounded in the documents provided. The final factuality score is calculated by three different LLM judges that determine individual scores based on the percentage of accurate model outputs. The final score is an average of the three judges’ scores. The researchers emphasize that factuality and grounding are key to the future success and usefulness of LLMs. They believe that comprehensive benchmarking methods, along with continuous research and development, will continue to improve AI systems. However, they also recognize that benchmarks can quickly become outdated due to rapid advancements in the field, making the launch of the FACTS Grounding benchmark and leaderboard just the beginning.

OpenAI has begun building out its robotics team

VentureBeat 1/12/2025

AI Enhanced Version

OpenAI, renowned for its artificial intelligence models primarily utilized in cloud servers, websites, and computer and mobile applications, is expanding its horizons beyond software. The company recently announced the creation of its first hardware robotics positions, signaling a significant investment in its own robotics division. This announcement was made by Caitlin Kalinowski, a technical expert at OpenAI and former head of AR glasses at Meta. The new positions include a systems integration electrical engineer to design sensor suites for the company's robots, a mechanical robotics product engineer to develop gears, actuators, motors, and linkages for robots, and a TPM manager to oversee logistics in the data-gathering lab. The job descriptions reveal the company's ambitious new direction: "Our robotics team is focused on unlocking general-purpose robotics and pushing towards AG-level intelligence in dynamic, real-world settings. We integrate cutting-edge hardware and software across the entire model stack to explore a wide range of robotic form factors. Our goal is to seamlessly blend high-level AI capabilities with the physical constraints of reality." Kalinowski, who joined OpenAI just over two months ago, has been tasked with leading the robotics and consumer hardware division. Previously, OpenAI had collaborated with Jony Ive, ex-Apple lead designer, and partnered with robotics startup Figure for the AI models in their humanoid robots. This move towards hardware and robotics could potentially position OpenAI as a competitor to Figure. This wouldn't be uncharted territory for the company, as it already competes with, and receives investment from, Microsoft. For those interested in staying updated on advancements in generative AI, VB Daily provides an insider's perspective. We deliver information on everything from regulatory changes to practical deployments, ensuring you have the insights needed to maximize your return on investment.

Listen to your technology users — they have led to the most disruptive innovations in history

VentureBeat 1/12/2025

AI Enhanced Version

In 1971, the Advanced Research Projects Agency Network (ARPANET), the precursor to the internet as we know it today, had a mere 1,000 users. The @ symbol was relatively unknown until engineer Ray Tomlinson devised a system to send messages across the ARPANET network using the @ symbol to denote recipients. Thus, email was born, not from a company's product development team but from a user seeking a solution to a problem. This pivotal innovation was not fully appreciated until 1993, nearly 25 years later. This anecdote underscores the often overlooked yet significant role of users in fostering disruptive innovation. From the dishwasher and telephone to modern tech companies like Airbnb, many groundbreaking inventions have been the brainchild of users seeking to solve their own problems. Our recent study, published in the Journal of Product Innovation Management, analyzed 60 cases of disruptive innovation, ranging from LASIK surgery to electric power tools. We found that nearly half of these innovations originated from users rather than producers. Users' intimate understanding of a problem and their unique perspective on where current solutions fail can offer invaluable insights. By integrating these insights with their own technical expertise, companies can unlock a wealth of growth opportunities and gain a competitive edge. Disruptive ideas often emerge from individual consumers seeking to meet their own needs, as well as professionals looking for new tools or systems to enhance their job performance. For instance, the heart-lung machine, which paved the way for successful open-heart surgeries, was developed by a physician and his wife. Our research challenged the prevailing narrative that disruptive innovation primarily originates from startups and new market entrants, with larger, established companies lagging behind. Instead, we found that users can be a valuable source of innovative ideas rather than a hindrance. So, how can companies harness the potential of user-generated innovation? Firstly, fostering a culture of open innovation that values external insights is crucial. While your R&D team may be adept at creating something new, they may not always know what needs to be built. It's particularly important to seek user insights during periods of rapid change in customer needs. Establishing channels for dialogue and engagement with customers is vital. Beyond surveys and focus groups, companies need to delve into unmet needs and pain points to identify truly disruptive ideas. Monitoring social media and online user communities can provide valuable insights into how users adapt existing products and their desires for new functionalities. Focusing on lead users, who are often ahead of market trends, can also be beneficial. These users are often the first to identify emerging needs and can provide valuable input for new solutions. However, their feedback should be taken with caution as they may prioritize niche functionalities that may not appeal to the mainstream market. Lastly, co-creation initiatives that promote direct collaboration with user innovators can prove fruitful. For instance, contests for new product or feature ideas or hackathons that bring together users and technical experts can generate potentially disruptive solutions. In their quest for innovation, companies often overlook one of the most potent sources of groundbreaking ideas — their own users. By tapping into this vast pool of creativity and expertise, companies can fuel truly disruptive innovation. Christina Raasch and Tim Schweisfurth are Professors at the Kühne Logistics University and Hamburg University of Technology in Germany, respectively. They contribute to DataDecisionMakers, a platform where experts share data-related insights and innovation. For cutting-edge ideas and up-to-date information on data and tech, join the DataDecisionMakers community. You may even consider contributing an article of your own!

The 13 best ideas, products and services of CES 2025 | The DeanBeat

VentureBeat 1/12/2025

AI Enhanced Version

The CES 2025 tech trade show in Las Vegas concluded after a six-day run, attracting a crowd of approximately 141,000 attendees. As a journalist, I had the opportunity to explore and report on some of the most innovative technology on display. From robotics to AI, the technology of the future is fast approaching, and I am excited to welcome these advancements. Throughout the event, I covered around 46.79 miles, or 105,433 steps, over 5.5 days, a slight increase from the previous year. Despite the physical strain, I managed to avoid any illness and was able to produce 65 articles on the event. This article focuses on the most impressive tech I encountered at the show. This year, the event showcased nearly 4,500 exhibitors, a steady increase from previous years. Among the standout technologies were smart mirrors, AI gaming panels, and advanced robotics. One significant announcement was Nvidia's unveiling of the Cosmos world foundation model platform, designed to accelerate the development of AI systems for real-world applications. Other notable technologies included Soliddd’s SolidddVision smart glasses designed for people with macular degeneration, and Onscreen's AI companion app, Joy, designed to help caregivers and seniors. Nvidia also revealed Project DIGITS, a personal AI supercomputer that makes AI technology more accessible to researchers and students. In addition to these, I was impressed by the Hormometer from Eli Health, a device that allows real-time hormone testing via saliva analysis. German Bionic's Apogee Ultra exoskeleton, designed to assist workers with physically demanding tasks, and Reelables' innovative package tracking technology using paper-based electronics were also noteworthy. Lenovo Legion unveiled a lineup of next-generation gaming devices, including the Lenovo Legion Go S, the first officially licensed third-party handheld powered by SteamOS. This device could potentially pose competition for Nintendo's rumored Switch 2 gaming hybrid. In conclusion, CES 2025 was a showcase of the future of technology, evidencing the rapid advancements in AI, robotics, and other fields. These technologies have the potential to revolutionize various aspects of our lives, from healthcare to gaming, making them more efficient and accessible.

AI is set to transform education — what enterprise leaders can learn from this development

VentureBeat 1/12/2025

AI Enhanced Version

The dawn of a new era in education, driven by technological advancements, is upon us after six decades of dreaming and trials. A recent development is the approval of an online school by the Arizona State Board for Charter Schools. The school, Unbound Academy, replaces traditional teachers with AI teaching assistants, promising to deliver 2.4 times the academic growth for students compared to conventional schools. This innovation is not just another tech experiment. It represents a significant milestone in a 60-year journey to revolutionize education through Computer Assisted Instruction (CAI). If successful, this could be the realization of a long-held dream. The concept of using computers to aid student learning dates back to the 1950s, with the first application, Programmed Logic for Automatic Teaching Operations (PLATO), launched in 1961. PLATO offered interactive lessons and real-time feedback using terminals connected to a time-share computer system via telephone lines. Despite its innovative approach, PLATO failed due to high operating costs. In the early 2000s, immersive, experimental learning took a new turn with Second Life, a virtual world where users interacted as avatars. Although not explicitly a CAI tool, Second Life showcased the potential of immersive virtual learning environments. However, it struggled to gain traction due to a complex user interface, high technical requirements, a steep learning curve, and scalability issues. The introduction of generative AI in 2017 marked a turning point in CAI. Tools like Writable and Photomath enhanced both teaching and learning. Writable, for instance, uses AI to provide feedback on student writing, thereby assisting teachers with large workloads. Such tools underscore AI's potential in addressing the resource constraints of traditional education. In a more comprehensive approach, Khan Academy has been offering free online education tutorials since 2008. In 2023, the company launched Khanmigo, an interactive AI tutor for students. In a 2023 TED Talk, Khan Academy's founder, Sal Khan, discussed the potential of Khanmigo in improving student performance. He cited a 1984 paper titled "The 2 Sigma Problem" by education Professor Benjamin Bloom, which argued that individualized tutoring significantly outperformed traditional classroom instruction. Khan argues that AI-infused technology like Khanmigo effectively overcomes resource constraints, potentially emulating the benefits of a human tutor. Despite some criticisms of Bloom's paper, there is a consensus that technology could improve educational outcomes. However, the adoption of AI tools raises questions about the role of human connection in learning. This brings us back to Unbound Academy. Students will spend two hours online each school morning, working through AI-driven lessons in math, reading, and science. Tools like Khanmigo and IXL will personalize the instruction and analyze progress, adjusting the difficulty and content in real-time to optimize learning outcomes. The model significantly reduces the role of human teachers, raising questions about the impact on students and the teaching profession. Unbound Academy's model is already in use in several private schools, and the results obtained substantiate its claimed advantages. However, the impact of a computer-based model on a student's ability to foster human connections outside of a traditional school setting remains unclear. These complexities underline the challenges schools like Unbound Academy face as they redefine the educational landscape. Khanmigo is currently being piloted in 266 school districts in the U.S. in grades three through 12. This pilot program offers a glimpse into how AI could integrate into existing education systems, supporting both teachers and students by enhancing lesson planning, saving time, and providing real-time insights into student progress. CAI has come a long way since PLATO, and if AI-driven models succeed, they could democratize access to high-quality instruction. While AI has the potential to widen existing disparities, it also offers unprecedented opportunities to bring quality education to underserved communities. As schools like Unbound Academy and those piloting Khanmigo pioneer AI-driven teaching models, they are not just testing a new educational approach — they are challenging our fundamental assumptions about how learning happens and what role human teachers should play in that process. The results could reshape education for generations to come. For more insights and innovation in data-related tech, join us at DataDecisionMakers. You might even consider contributing an article of your own!