diff --git a/src/README.md b/src/README.md
new file mode 100644
index 0000000..5bf0eba
--- /dev/null
+++ b/src/README.md
@@ -0,0 +1,6 @@
+# TravelPlanner
+
+This website is adapted from [Nerfies website](https://nerfies.github.io).
+
+# Website License
+
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
diff --git a/src/index.html b/src/index.html
new file mode 100644
index 0000000..71e58e8
--- /dev/null
+++ b/src/index.html
@@ -0,0 +1,1372 @@
+
+
+
+
๐ค
+ + + Dataset + + + + + + + + +๐
+ + Leaderboard + + + + + + + +๐
+ + Environment + + + + + + + + +๐
+ + Twitter + + ++ We introduce TravelPlanner: a comprehensive benchmark designed to evaluate the planning abilities of + language agents in real-world scenarios across multiple dimensions. + Without losing generality, TravelPlanner casts travel planning as its test environment, with all relevant + information meticulously crafted to minimize data contamination. + TravelPlanner does not have a singular ground truth for each query. + Instead, the benchmark employs several pre-defined evaluation scripts to assess each tested plan, + determining whether the language agent can effectively use tools to create a plan that aligns with both + the implicit commonsense and explicit user needs outlined in the query (i.e., commonsense constraint and + hard constraint). + Every query in TravelPlanner has undergone thorough human verification to guarantee that feasible solutions + exist. + Additionally, TravelPlanner evaluates the language agent's capability by varying the breadth and depth of + planning, controlled through the number of travel days and the quantity of hard constraints. +
++ + We introduce TravelPlanner, a benchmark crafted for evaluating language agents in tool-use and complex + planning within multiple constraints. + Grounded in travel planning, a real world use-case that naturally includes diverse constraints such as + user needs and commonsense constraints in the environment, TravelPlanner evaluates whether language agents + can develop reasonable travel plans by collecting information via diverse tools and making decisions, + while satisfying the constraints. + For a given query, language agents are expected to formulate a comprehensive plan that includes + transportation, daily meals, attractions, and accommodation for each day. + For constraints, from the perspective of real world applications, we design three types of them: + Environment Constraint, Commonsense Constraint, and Hard Constraint. + TravelPlanner comprises 1,225 queries in total. The number of days and hard constraints are designed to test + agents' abilities across both the breadth and depth of complex planning. +
+ + + ++ And the benchmark is divided into the training, validation, and test set. + +
+ Dataset distribution of TravelPlanner. +
++ Examples in train set: +
+
+ Easy Level & 3-day
+
+ Easy Level & 5-day
+
+ Easy Level & 7-day
+
+ Medium Level & 3-day
+
+ Medium Level & 5-day
+
+ Medium Level & 7-day
+
+ Hard Level & 3-day
+
+ Hard Level & 5-day
+
+ Hard Level & 7-day
+TravelPlanner + Constraint description. The environment constraint is manifested through the feedback received from the + environment, assessing whether the language agent can adjust its plan appropriately. The commonsense + constraint and hard constraint are evaluated based on how well the language agent's plan aligns with + these specific criteria. +
+Tool description and the number of items in the database. The original data for each tool is sourced from publicly available internet data. We then modify this data, which includes adding, deleting, and altering certain keys and values to suit our requirements. In this way, we effectively avoid the problem of data contamination. +
+Main results of different LLMs and planning strategies on the TravelPlanner validation and test set. The best results are marked in bold. +
++ Tool-use error distribution on test set. We set the maximum tool-use process step as 30. An agent will trigger an early stop if it either makes three consecutive failed attempts or repeats an action thrice consecutively, indicating a dead loop. +
Constraint pass rate of GPT-4-Turbo on test set. The results of sole-planning mode are based on Direct strategy. Note that plans failing to meet the ``Within Sandbox'' or ``No Missed Key Information'' criteria are excluded from the hard constraint pass rate calculation. This exclusion is due to the fact that information beyond the sandbox's scope or key details that are missed cannot be effectively searched or evaluated.
+Comparison of Information Collection Numbers Between GPT-4-Turbo and Reference. The results of GPT-4-Turbo are based on the number of entries it write into the working memory through the ``NotebookWrite''.
+GPT-4-Turbo + ReAct in tool-use scenario. +
++ GPT-4-Turbo + Direct Planning in sole-planning scenario. +
GPT-4-Turbo + Reflexion Planning in sole-planning scenario.
+@article{Xie2024TravelPlanner,
+ author = {},
+ title = {TravelPlanner: Toward Real-World Planning with Language Agents},
+ journal = {},
+ year = {2024}
+}
+ Question
+${question}
+ `; + else + html = ` +Question
+${question} (unit: ${unit})
+ `; + return html; +} + +function make_img(path) { + if (path === null) return ""; + let html = ``; + return html; +} + +function make_box(contents, cls = "") { + if (contents.join("").length === 0) return ""; + let html = ` +Choices
Choices
${choice}
`; + return html; +} + +function make_answer(answer) { + let html = `Answer
${answer}
`; + return html; +} \ No newline at end of file diff --git a/src/static/js/sort-table.js b/src/static/js/sort-table.js new file mode 100644 index 0000000..cfa2c24 --- /dev/null +++ b/src/static/js/sort-table.js @@ -0,0 +1,309 @@ +/** + * sort-table.js + * A pure JavaScript (no dependencies) solution to make HTML + * Tables sortable + * + * Copyright (c) 2013 Tyler Uebele + * Released under the MIT license. See included LICENSE.txt + * or http://opensource.org/licenses/MIT + * + * latest version available at https://github.com/tyleruebele/sort-table + */ + +/** + * Sort the rows in a HTML Table + * + * @param Table The Table DOM object + * @param col The zero-based column number by which to sort + * @param dir Optional. The sort direction; pass 1 for asc; -1 for desc + * @returns void + */ +function sortTable(Table, col, dir) { + var sortClass, i; + + // get previous sort column + sortTable.sortCol = -1; + sortClass = Table.className.match(/js-sort-\d+/); + if (null != sortClass) { + sortTable.sortCol = sortClass[0].replace(/js-sort-/, ''); + Table.className = Table.className.replace(new RegExp(' ?' + sortClass[0] + '\\b'), ''); + } + // If sort column was not passed, use previous + if ('undefined' === typeof col) { + col = sortTable.sortCol; + } + + if ('undefined' !== typeof dir) { + // Accept -1 or 'desc' for descending. All else is ascending + sortTable.sortDir = dir == -1 || dir == 'desc' ? -1 : 1; + } else { + // sort direction was not passed, use opposite of previous + sortClass = Table.className.match(/js-sort-(a|de)sc/); + if (null != sortClass && sortTable.sortCol == col) { + sortTable.sortDir = 'js-sort-asc' == sortClass[0] ? -1 : 1; + } else { + sortTable.sortDir = 1; + } + } + Table.className = Table.className.replace(/ ?js-sort-(a|de)sc/g, ''); + + // update sort column + Table.className += ' js-sort-' + col; + sortTable.sortCol = col; + + // update sort direction + Table.className += ' js-sort-' + (sortTable.sortDir == -1 ? 'desc' : 'asc'); + + // get sort type + if (col < Table.tHead.rows[Table.tHead.rows.length - 1].cells.length) { + sortClass = Table.tHead.rows[Table.tHead.rows.length - 1].cells[col].className.match(/js-sort-[-\w]+/); + } + // Improved support for colspan'd headers + for (i = 0; i < Table.tHead.rows[Table.tHead.rows.length - 1].cells.length; i++) { + if (col == Table.tHead.rows[Table.tHead.rows.length - 1].cells[i].getAttribute('data-js-sort-colNum')) { + sortClass = Table.tHead.rows[Table.tHead.rows.length - 1].cells[i].className.match(/js-sort-[-\w]+/); + } + } + if (null != sortClass) { + sortTable.sortFunc = sortClass[0].replace(/js-sort-/, ''); + } else { + sortTable.sortFunc = 'string'; + } + // Set the headers for the active column to have the decorative class + Table.querySelectorAll('.js-sort-active').forEach(function(Node) { + Node.className = Node.className.replace(/ ?js-sort-active\b/, ''); + }); + Table.querySelectorAll('[data-js-sort-colNum="' + col + '"]:not(:empty)').forEach(function(Node) { + Node.className += ' js-sort-active'; + }); + + // sort! + var rows = [], + TBody = Table.tBodies[0]; + + for (i = 0; i < TBody.rows.length; i++) { + rows[i] = TBody.rows[i]; + } + if ('none' != sortTable.sortFunc) { + rows.sort(sortTable.compareRow); + } + + while (TBody.firstChild) { + TBody.removeChild(TBody.firstChild); + } + for (i = 0; i < rows.length; i++) { + TBody.appendChild(rows[i]); + } +} + +/** + * Compare two table rows based on current settings + * + * @param RowA A TR DOM object + * @param RowB A TR DOM object + * @returns {number} 1 if RowA is greater, -1 if RowB, 0 if equal + */ +sortTable.compareRow = function(RowA, RowB) { + var valA, valB; + if ('function' != typeof sortTable[sortTable.sortFunc]) { + sortTable.sortFunc = 'string'; + } + valA = sortTable[sortTable.sortFunc](RowA.cells[sortTable.sortCol]); + valB = sortTable[sortTable.sortFunc](RowB.cells[sortTable.sortCol]); + + return valA == valB ? 0 : sortTable.sortDir * (valA > valB ? 1 : -1); +}; + +/** + * Strip all HTML, no exceptions + * @param html + * @returns {string} + */ +sortTable.stripTags = function(html) { + replace_unit = (s) => { + let iUnit = (s.indexOf('M') > -1) ? s.indexOf('M') : s.indexOf('B'); + if (iUnit == -1) return s; + let unit = s[iUnit]; + let val = Number(s.substring(0, iUnit)); + if (isNaN(val)) return s; + val *= (unit == 'M') ? 1000000 : 1000000000; + return val.toString(); + } + html = replace_unit(html); + return html.replace(/<\/?[a-z][a-z0-9]*\b[^>]*>/gi, ''); +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * Converts innerHTML to a timestamp, 0 for invalid dates + * + * @param Cell A TD DOM object + * @returns {Number} + */ +sortTable.date = function(Cell) { + // If okDate library is available, Use it for advanced Date processing + if (typeof okDate !== 'undefined') { + var kDate = okDate(sortTable.stripTags(Cell.innerHTML)); + return kDate ? kDate.getTime() : 0; + } else { + return (new Date(sortTable.stripTags(Cell.innerHTML))).getTime() || 0; + } +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * Converts innerHTML to a JS Number object + * + * @param Cell A TD DOM object + * @returns {Number} + */ +sortTable.number = function(Cell) { + return Number(sortTable.stripTags(Cell.innerHTML).replace(/[^-\d.]/g, '')); +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * Converts innerHTML to a lower case string for insensitive compare + * + * @param Cell A TD DOM object + * @returns {String} + */ +sortTable.string = function(Cell) { + return sortTable.stripTags(Cell.innerHTML).toLowerCase(); +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * + * @param Cell A TD DOM object + * @returns {String} + */ +sortTable.raw = function(Cell) { + return Cell.innerHTML; +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * Captures the last space-delimited token from innerHTML + * + * @param Cell A TD DOM object + * @returns {String} + */ +sortTable.last = function(Cell) { + return sortTable.stripTags(Cell.innerHTML).split(' ').pop().toLowerCase(); +}; + +/** + * Helper function that converts a table cell (TD) to a comparable value + * Captures the value of the first childNode + * + * @param Cell A TD DOM object + * @returns {String} + */ +sortTable.input = function(Cell) { + for (var i = 0; i < Cell.children.length; i++) { + if ('object' == typeof Cell.children[i] + && 'undefined' != typeof Cell.children[i].value + ) { + return Cell.children[i].value.toLowerCase(); + } + } + + return sortTable.string(Cell); +}; + +/** + * Helper function that prevents sorting by always returning null + * + * @param Cell A TD DOM object + * @returns null + */ +sortTable.none = function(Cell) { + return null; +}; + +/** + * Return the click handler appropriate to the specified Table and column + * + * @param Table Table to sort + * @param col Column to sort by + * @returns {Function} Click Handler + */ +sortTable.getClickHandler = function(Table, col) { + return function() { + sortTable(Table, col); + }; +}; + +/** + * Attach sortTable() calls to table header cells' onclick events + * If the table(s) do not have a THead node, one will be created around the + * first row + */ +sortTable.init = function() { + var THead, Tables, Handler; + if (document.querySelectorAll) { + Tables = document.querySelectorAll('table.js-sort-table'); + } else { + Tables = document.getElementsByTagName('table'); + } + + for (var i = 0; i < Tables.length; i++) { + // Because IE<8 doesn't support querySelectorAll, skip unclassed tables + if (!document.querySelectorAll && null === Tables[i].className.match(/\bjs-sort-table\b/)) { + continue; + } + + // Prevent repeat processing + if (Tables[i].attributes['data-js-sort-table']) { + continue; + } + + // Ensure table has a tHead element + if (!Tables[i].tHead) { + THead = document.createElement('thead'); + THead.appendChild(Tables[i].rows[0]); + Tables[i].insertBefore(THead, Tables[i].children[0]); + } else { + THead = Tables[i].tHead; + } + + // Attach click events to table header + for (var rowNum = 0; rowNum < THead.rows.length; rowNum++) { + for (var cellNum = 0, colNum = 0; cellNum < THead.rows[rowNum].cells.length; cellNum++) { + // Skip headers marked "js-sort-none" + if (THead.rows[rowNum].cells[cellNum].className.match(/\bjs-sort-none\b/)) { + continue; + } + // Define which column the header should invoke sorting for + THead.rows[rowNum].cells[cellNum].setAttribute('data-js-sort-colNum', colNum); + Handler = sortTable.getClickHandler(Tables[i], colNum); + window.addEventListener + ? THead.rows[rowNum].cells[cellNum].addEventListener('click', Handler) + : window.attachEvent && THead.rows[rowNum].cells[cellNum].attachEvent('onclick', Handler); + colNum += THead.rows[rowNum].cells[cellNum].colSpan; + } + } + + // Mark table as processed + Tables[i].setAttribute('data-js-sort-table', 'true') + } + + // Add default styles as the first style in head so they can be easily overwritten by user styles + var element = document.createElement('style'); + document.head.insertBefore(element, document.head.childNodes[0]); + var sheet = element.sheet; + sheet.insertRule('table.js-sort-table.js-sort-asc thead tr > .js-sort-active:not(.js-sort-none):after {content: "\\25b2";font-size: 0.7em;padding-left: 3px;line-height: 0.7em;}', 0); + sheet.insertRule('table.js-sort-table.js-sort-desc thead tr > .js-sort-active:not(.js-sort-none):after {content: "\\25bc";font-size: 0.7em;padding-left: 3px;line-height: 0.7em;}', 0); +}; + +// Run sortTable.init() when the page loads +window.addEventListener + ? window.addEventListener('load', sortTable.init, false) + : window.attachEvent && window.attachEvent('onload', sortTable.init) + ; + +// Shim for IE11's lack of NodeList.prototype.forEach +if (typeof NodeList.prototype.forEach !== "function") { + NodeList.prototype.forEach = Array.prototype.forEach; +}