Skip to content

Commit

Permalink
Update index.mdx
Browse files Browse the repository at this point in the history
Fix grammar
  • Loading branch information
StevenyzZhang authored Oct 21, 2024
1 parent 0262b03 commit 61ffde5
Showing 1 changed file with 4 additions and 4 deletions.
8 changes: 4 additions & 4 deletions src/pages/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -88,17 +88,17 @@ import exp4 from "../assets/exp4.png";
{/* <div class="max-w-[60rem]"> */}
<Video source={demo} />

<div class="bg-gray-100 max-w-[50rem]">
<div class="bg-gray-100 max-w-[60rem]">
<h2 class="text-center text-3xl font-bold">Abstract</h2>
<div class="p-4 max-w-[60rem] m-auto">
{/* ## Abstract */}
Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas. However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. Beyond end-to-end benchmarking, Sketch2Code supports interactive agent evaluation that mimics real-world design workflows, where a VLM-based agent iteratively refines its generations by communicating with a simulated user, either passively receiving feedback instructions or proactively asking clarification questions. We comprehensively analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs; even the most capable models struggle to accurately interpret sketches and formulate effective questions that lead to steady improvement. Nevertheless, a user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception, highlighting the need to develop more effective paradigms for multi-turn conversational agents
Sketches are a natural and accessible medium for UI designers to conceptualize early-stage ideas. However, existing research on UI/UX automation often requires high-fidelity inputs like Figma designs or detailed screenshots, limiting accessibility and impeding efficient design iteration. To bridge this gap, we introduce Sketch2Code, a benchmark that evaluates state-of-the-art Vision Language Models (VLMs) on automating the conversion of rudimentary sketches into webpage prototypes. Beyond end-to-end benchmarking, Sketch2Code supports interactive agent evaluation that mimics real-world design workflows, where a VLM-based agent iteratively refines its generations by communicating with a simulated user, either passively receiving feedback instructions or proactively asking clarification questions. We comprehensively analyze ten commercial and open-source models, showing that Sketch2Code is challenging for existing VLMs; even the most capable models struggle to accurately interpret sketches and formulate effective questions that lead to steady improvement. Nevertheless, a user study with UI/UX experts reveals a significant preference for proactive question-asking over passive feedback reception, highlighting the need to develop more effective paradigms for multi-turn conversational agents.
</div>
</div>

## Overview

Sketch2Code consists of 731 human-drawn sketches paired with 484 real-world webpages with varying levels of precisions and drawing styles. To mirror realistic design workflows and study how well VLMs can interact with humans, our framework further introduces two multi-turn evaluation scenarios between a sketch2code agent and a human/simulated user: (1) the sketch2code agent follows feedback from the user (***feedback following***) and (2) the sketch2code agent proactively asks the user questions for design details and clarification (***question asking***). To this end, our framework assesses not only the ability of models to generate initial implementations based on abstract inputs but also their capacity to adapt and evolve these implementations in response to user feedback.
Sketch2Code consists of 731 human-drawn sketches paired with 484 real-world webpages with varying levels of precision and drawing styles. To mirror realistic design workflows and study how well VLMs can interact with humans, our framework further introduces two multi-turn evaluation scenarios between a sketch2code agent and a human/simulated user: (1) the sketch2code agent follows feedback from the user (***feedback following***) and (2) the sketch2code agent proactively asks the user questions for design details and clarification (***question asking***). To this end, our framework assesses not only the ability of models to generate initial implementations based on abstract inputs but also their capacity to adapt and evolve these implementations in response to user feedback.

<Figure
caption="Benchmark Overview: direct generation (left) and multi-turn interactive evaluations (right)"
Expand Down Expand Up @@ -172,4 +172,4 @@ We find that all models displayed noticeable improvements in feedback following.
year = "2024",
howpublished = "arXiv preprint",
}
```
```

0 comments on commit 61ffde5

Please sign in to comment.