Close Menu
    DevStackTipsDevStackTips
    • Home
    • News & Updates
      1. Tech & Work
      2. View All

      CodeSOD: A Unique Way to Primary Key

      July 22, 2025

      BrowserStack launches Figma plugin for detecting accessibility issues in design phase

      July 22, 2025

      Parasoft brings agentic AI to service virtualization in latest release

      July 22, 2025

      Node.js vs. Python for Backend: 7 Reasons C-Level Leaders Choose Node.js Talent

      July 21, 2025

      The best CRM software with email marketing in 2025: Expert tested and reviewed

      July 22, 2025

      This multi-port car charger can power 4 gadgets at once – and it’s surprisingly cheap

      July 22, 2025

      I’m a wearables editor and here are the 7 Pixel Watch 4 rumors I’m most curious about

      July 22, 2025

      8 ways I quickly leveled up my Linux skills – and you can too

      July 22, 2025
    • Development
      1. Algorithms & Data Structures
      2. Artificial Intelligence
      3. Back-End Development
      4. Databases
      5. Front-End Development
      6. Libraries & Frameworks
      7. Machine Learning
      8. Security
      9. Software Engineering
      10. Tools & IDEs
      11. Web Design
      12. Web Development
      13. Web Security
      14. Programming Languages
        • PHP
        • JavaScript
      Featured

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025
      Recent

      The Intersection of Agile and Accessibility – A Series on Designing for Everyone

      July 22, 2025

      Zero Trust & Cybersecurity Mesh: Your Org’s Survival Guide

      July 22, 2025

      Execute Ping Commands and Get Back Structured Data in PHP

      July 22, 2025
    • Operating Systems
      1. Windows
      2. Linux
      3. macOS
      Featured

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025
      Recent

      A Tomb Raider composer has been jailed — His legacy overshadowed by $75k+ in loan fraud

      July 22, 2025

      “I don’t think I changed his mind” — NVIDIA CEO comments on H20 AI GPU sales resuming in China following a meeting with President Trump

      July 22, 2025

      Galaxy Z Fold 7 review: Six years later — Samsung finally cracks the foldable code

      July 22, 2025
    • Learning Resources
      • Books
      • Cheatsheets
      • Tutorials & Guides
    Home»Development»Machine Learning»This AI Paper Introduces an LLM+FOON Framework: A Graph-Validated Approach for Robotic Cooking Task Planning from Video Instructions

    This AI Paper Introduces an LLM+FOON Framework: A Graph-Validated Approach for Robotic Cooking Task Planning from Video Instructions

    April 8, 2025
    This AI Paper Introduces an LLM+FOON Framework: A Graph-Validated Approach for Robotic Cooking Task Planning from Video Instructions

    Robots are increasingly being developed for home environments, specifically to enable them to perform daily activities like cooking. These tasks involve a combination of visual interpretation, manipulation, and decision-making across a series of actions. Cooking, in particular, is complex for robots due to the diversity in utensils, varying visual perspectives, and frequent omissions of intermediate steps in instructional materials like videos. For a robot to succeed in such tasks, a method is needed that ensures logical planning, flexible understanding, and adaptability to different environmental constraints.

    One major problem in translating cooking demonstrations into robotic tasks is the lack of standardization in online content. Videos might skip steps, include irrelevant segments like introductions, or show arrangements that do not align with the robot’s operational layout. Robots must interpret visual data and textual cues, infer omitted steps, and translate this into a sequence of physical actions. However, when relying purely on generative models to produce these sequences, there is a high chance of logic failures or hallucinated outputs that render the plan infeasible for robotic execution.

    Current tools supporting robotic planning often focus on logic-based models like PDDL or more recent data-driven approaches using Large Language Models (LLMs) or multimodal architectures. While LLMs are adept at reasoning from diverse inputs, they cannot often validate whether the generated plan makes sense in a robotic setting. Prompt-based feedback mechanisms have been tested, but they still fail to confirm the logical correctness of individual actions, especially for complex, multi-step tasks like those in cooking scenarios.

    Researchers from the University of Osaka and the National Institute of Advanced Industrial Science and Technology (AIST), Japan, introduced a new framework integrating an LLM with a Functional Object-Oriented Network (FOON) to develop cooking task plans from subtitle-enhanced videos. This hybrid system uses an LLM to interpret a video and generate task sequences. These sequences are then converted into FOON-based graphs, where each action is checked for feasibility against the robot’s current environment. If a step is deemed infeasible, feedback is generated so that the LLM can revise the plan accordingly, ensuring that only logically sound steps are retained.

    This method involves several layers of processing. First, the cooking video is split into segments based on subtitles extracted using Optical Character Recognition. Key video frames are selected from each segment and arranged into a 3×3 grid to serve as input images. The LLM is prompted with structured details, including task descriptions, known constraints, and environment layouts. Using this data, it infers the target object states for each segment. These are cross-verified by FOON, a graph system where actions are represented as functional units containing input and output object states. If an inconsistency is found—for instance, if a hand is already holding an item when it’s supposed to pick something else—the task is flagged and revised. This loop continues until a complete and executable task graph is formed.

    The researchers tested their method using five full cooking recipes from ten videos. Their experiments successfully generated complete and feasible task plans for four of the five recipes. In contrast, a baseline approach that used only the LLM without FOON validation succeeded in just one case. Specifically, the FOON-enhanced method had a success rate of 80% (4/5), while the baseline achieved only 20% (1/5). Moreover, in the component evaluation of target object node estimation, the system achieved an 86% success rate in accurately predicting object states. During the video preprocessing stage, the OCR process extracted 270 subtitle words compared to the ground truth of 230, resulting in a 17% error rate, which the LLM could still manage by filtering redundant instructions.

    In a real-world trial using a dual-arm UR3e robot system, the team demonstrated their method on a gyudon (beef bowl) recipe. The robot could infer and insert a missing “cut” action that was absent in the video, showing the system’s ability to identify and compensate for incomplete instructions. The task graph for the recipe was generated after three re-planning attempts, and the robot completed the cooking sequence successfully. The LLM also correctly ignored non-essential scenes like the video introduction, identifying only 8 of 13 necessary segments for task execution.

    This research clearly outlines the problem of hallucination and logical inconsistency in LLM-based robotic task planning. The proposed method offers a robust solution to generate actionable plans from unstructured cooking videos by incorporating FOON as a validation and correction mechanism. The methodology bridges reasoning and logical verification, enabling robots to execute complex tasks by adapting to environmental conditions while maintaining task accuracy.


    Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 85k+ ML SubReddit.

    🔥 [Register Now] miniCON Virtual Conference on OPEN SOURCE AI: FREE REGISTRATION + Certificate of Attendance + 3 Hour Short Event (April 12, 9 am- 12 pm PST) + Hands on Workshop [Sponsored]

    The post This AI Paper Introduces an LLM+FOON Framework: A Graph-Validated Approach for Robotic Cooking Task Planning from Video Instructions appeared first on MarkTechPost.

    Source: Read More 

    Facebook Twitter Reddit Email Copy Link
    Previous ArticleSensor-Invariant Tactile Representation for Zero-Shot Transfer Across Vision-Based Tactile Sensors
    Next Article How To Fix Forced Reflows And Layout Thrashing

    Related Posts

    Machine Learning

    How to Evaluate Jailbreak Methods: A Case Study with the StrongREJECT Benchmark

    July 22, 2025
    Machine Learning

    Boolformer: Symbolic Regression of Logic Functions with Transformers

    July 22, 2025
    Leave A Reply Cancel Reply

    For security, use of Google's reCAPTCHA service is required which is subject to the Google Privacy Policy and Terms of Use.

    Continue Reading

    Google AI Introduces Multi-Agent System Search MASS: A New AI Agent Optimization Framework for Better Prompts and Topologies

    Machine Learning

    You can now mirror Android’s screen to Windows 11 from the Start menu

    Operating Systems

    CVE-2025-47271 – GitHub OZI Action Command Injection

    Common Vulnerabilities and Exposures (CVEs)

    Scaling Diffusion Language Models via Adaptation from Autoregressive Models

    Machine Learning

    Highlights

    What’s new in iOS 18.4? AI priority notifications and 9 other big updates

    April 1, 2025

    Apple also released software updates for iPadOS, WatchOS, MacOS, VisionOS, and TVOS. Here’s a list…

    Recognition of Excellence: A Branding Tool You Can’t Afford to Miss

    June 3, 2025

    fstl-e – fast stl viewer

    July 3, 2025

    Last Week in AI #314 – Meta’s Superintelligence hires, AlphaGenome, Gemini CLI

    July 3, 2025
    © DevStackTips 2025. All rights reserved.
    • Contact
    • Privacy Policy

    Type above and press Enter to search. Press Esc to cancel.