D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models


UC Logo UNIVPM Logo

D-RMGPT: System Overview



  • DetGPT-V: Taking as input an image of the component list and two captured RGB frames of the scene, detects the components already assembled on the workbench.
  • R-ManGPT: Based on the components detected and having in memory the components already brought by the robot (but maybe not installed by the operator), decides which is the next component to be assembled and plan the robot movements to bring it to the operator.

  • D-RMGPT Tests and Evaluations



    D-RMGPT has been evaluated with different operators experience levels: .

  • Inexperienced Operators: 12 tests with 12 different operators who managed to assembly the product for the first time assisted by D-RMGPT.
  • Experienced Operators: 3 tests with skilled operators who change the assembly sequence suggested by D-RMGPT, demonstrating its flexebility and recoverability
  • Inexperienced Operators Manual Assembly Process: 10 different operators first assembled the product following a classical approach, reading the assembly instructions provided by the manufacturer. The time required following that procedure has been compared with that required with the assisted approach.

  • Assembly Process Examples

    Assembly process followed by an inexperienced operator.



    Assembly process followed by an experienced operator.



    Example of failure of the system due to an incorrect detection.




    D-MRGPT Performances

    Test results inexperienced operator. In the 12 tests conducted with inexperienced operators, 10 were able to successfully complete the assembly, while 2 resulted in assembly failure due to a detection failure, achieving a success rate of 83%. For each test, the final completion time of the assembly, the average time required by the framework to suggest the next component for the operator to install, and any false positives and negatives detected in the detection process are reported.



    Test results experienced operator. Three tests were conducted with experienced operators, who altered the assembly sequence differently each time to demonstrate the system's flexibility and ability to recover from unplanned situations. In all three tests, the operator successfully completed the assembly.



    Detector comaparison with other VLM-based object detector systems. A comparison in terms of Precision and Recall metrics has been conducted between the DetGPT-V module and two other state-of-the-art object detectors for VLM-based object detection systems.

    Assembly Performances Improvements thanks to D-RMGPT

    As a final result, a comparison is provided between the time required by inexperienced operators who have never seen the product before and assemble it using the D-RMGPT framework, and other operators with the same knowledge level (no prior knowledge) who assemble the aircraft in a traditional manner, following the instruction manual provided by the manufacturer. 20 different operator testers have been involved: 10 for D-RMGPT assisted assembly and 10 for manual assembly.


    The average time in case with D-RMGPT is equal to 310.8 s and it is 33.4% lower than the average time required with a manual assembly approach equal to 466.8 s. From the standard deviation is clear visible that the right case has less variations in time required.


    Prompts used

    Paper

    BibTex

    @misc{forlini2024drmgptrobotassistedcollaborativetasks,
      title={D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models}, 
      author={M. Forlini and M. Babcinschi and G. Palmieri and P. Neto},
      year={2024},
      eprint={2408.11761},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2408.11761}, 
    }