Conducting a Qualitative Analysis by Comparing the Outputs of Our Think-and-Execute Framework

25 Mar 2025

Table of Links

Abstract and 1. Introduction

7 Limitations and Discussion

8 Conclusion and References

A Experimental Details

B Details of Think-and-Execute

C Prompts Used in Our Experiments

D Human-written Pseudocode Prompts

E Generated Analyses

F Generated Pseudocode Prompts

G Qualitative Analysis

We conduct a qualitative analysis by comparing the outputs of our approach (THINKAND-EXECUTE) with those of the baseline methods. This comparison is presented across Tables7,8,9,10,11,12, and 13.

Table 7: A comparison of results for Dyck Languages between the baseline methods and THINK-AND-EXECUTE.

Table 8: A comparison of results for Geometric Shapes between the baseline methods and THINK-AND-EXECUTE.

Table 9: A comparison of results for Navigate between the baseline methods and THINKAND-EXECUTE.

Table 10: A comparison of results for Reasoning about Colored Objects Shapes between the baseline methods and ours.

Table 11: A comparison of results for Temporal Sequences between the baseline methods and THINK-AND-EXECUTE.

Table 12: A comparison of results for Tracking Shuffled Objectives between the baseline methods and THINK-AND-EXECUTE.

Table 13: A comparison of results for Web of lies between the baseline methods and THINKAND-EXECUTE.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.

Authors:

(1) Hyungjoo Chae, Yonsei University;

(2) Yeonghyeon Kim, Yonsei University;

(3) Seungone Kim, KAIST AI;

(4) Kai Tzu-iunn Ong, Yonsei University;

(5) Beong-woo Kwak, Yonsei University;

(6) Moohyeon Kim, Yonsei University;

(7) Seonghwan Kim, Yonsei University;

(8) Taeyoon Kwon, Yonsei University;

(9) Jiwan Chung, Yonsei University;

(10) Youngjae Yu, Yonsei University;

(11) Jinyoung Yeo, Yonsei University.

← Previous

Generated Pseudocode Prompts During Our Think-And-Execute Experiment