Record logs into step artifacts #1339

satansdeer · 2024-12-06T10:37:56Z

Recording step execution logs into artifacts.

During step, task and workflow_run execution we add log entries into the log field of the context.

Once step, task or workflow_run state is updated we record the entries into an artifact.

If there already is a log artifact for the given entity - it will be updated instead of creating a new one.

Assumptions

we should show all the "info", "warning", "error", "critical", "exception" logs for each step
we can record logs to artifacts on step, task and workflow_run update, for now no workflow_run_block
context gets wiped for each task
logs are stored inside the {entity_type}_update method

Log Artifacts API Access

I've introduced a new endpoint to access artifacts for different entities:

GET /{entity_type}/{entity_id}/artifacts/

Storing Log Artifact Files

skyvern/artifacts/logs
├── step
│   └── stp_336689492028900908
│       ├── 2024-12-13T07:27:20.956676_skyvern_log_raw.json
│       └── 2024-12-13T07:27:20.963041_skyvern_log.log
├── task
│   └── tsk_336815971888446190
│       ├── 2024-12-13T15:39:11.626435_skyvern_log_raw.json
│       └── 2024-12-13T15:39:11.637557_skyvern_log.log
└── workflow_run
    └── wr_336802094723172252
        ├── 2024-12-13T14:44:08.085465_skyvern_log_raw.json
        └── 2024-12-13T14:44:08.099234_skyvern_log.log

Testing plan

Run successful task
- Logs should be recorded into step artifacts for each step
- Logs should be recorded into task artifact
Run failing task
- Logs should be recorded into step artifact
- Logs should be recorded into task artifact
Run workflow
- Logs should be recorded into workflow_run artifacts

Screenshots

Successful run:

Failing run:

Updates

Added text log formatter:

Important

This PR adds functionality to record step execution logs as SkyvernLog artifacts in JSON format, with updates to logging, context, and artifact management.

Behavior:
- Logs from step execution are recorded in the log field of SkyvernContext and stored as SkyvernLog artifacts.
- Logs are stored in JSON format using SkyvernLogEncoder.
Artifact Management:
- Added SkyvernLog to ArtifactType in models.py and types.ts.
- Updated FILE_EXTENSION_MAP in storage/base.py to include SkyvernLog with json extension.
Logging:
- Added skyvern_logs_processor in forge_log.py to append logs to SkyvernContext.
- Integrated skyvern_logs_processor into setup_logger().
Frontend:
- Updated StepArtifacts.tsx to display SkyvernLog artifacts in the UI.
Misc:
- Created SkyvernLogEncoder in skyvern_log_encoder.py for custom JSON encoding of logs.

^{This description was created by}^{for b2cb9c4. It will automatically update as commits are pushed.}

ellipsis-dev

👍 Looks good to me! Reviewed everything up to b2cb9c4 in 2 minutes and 53 seconds

More details

Looked at 196 lines of code in 8 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. skyvern/forge/agent.py:1778

Draft comment:
The try-except block for recording the Skyvern log is redundant here as it is already handled in the update_step method. Consider removing this block to avoid code duplication.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment claims redundancy, but the diff does not show any evidence that the update_step method handles the same logging functionality. Without strong evidence of redundancy, the comment should be removed. The comment does not provide actionable or clear evidence of the issue.
I might be missing the full implementation details of the update_step method, which could potentially handle the logging. However, the diff does not provide this information, so I should not assume it.
The diff should provide strong evidence of redundancy for the comment to be valid. Without this evidence, the comment should be removed.
Remove the comment as it lacks strong evidence of redundancy in the try-except block for logging.

Workflow ID: wflow_tO2RAXSK8tQ1bpap

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

suchintan · 2024-12-06T18:15:59Z

I don't think JSON logs are useful to the reader. Can we generate 2 artifacts instead?

JSON Logs
Log lines similar in a readable fashion (event | param1=param1 param2=param2 ...)

suchintan · 2024-12-06T18:17:25Z

skyvern/forge/agent.py

+            log = skyvern_context.current().log
+            current_step_log = [entry for entry in log if entry.get("step_id", "") == step.step_id]
+            log_json = json.dumps(current_step_log, cls=SkyvernLogEncoder, indent=2)
+            await app.ARTIFACT_MANAGER.create_artifact(


Is there a way we can also capture task and workflow level logs? Even if we don't render them in the UI today

We historically associate logs like that w/ the first step, although @wintonzheng is currently overhauling it

yeah we just added new columns to the artifacts db: https://github.com/Skyvern-AI/skyvern/blob/main/skyvern/forge/sdk/db/models.py#L169-L174

https://github.com/Skyvern-AI/skyvern/pull/1345/files

here's the example of supporting artifact upload for observer thought.

going forward we can add support for task level and workflow run level artifacts

wintonzheng · 2024-12-07T06:37:32Z

skyvern/forge/agent.py

+
+        try:
+            log = skyvern_context.current().log
+            current_step_log = [entry for entry in log if entry.get("step_id", "") == step.step_id]


this PR is a good start.

There are a couple of problems i'm seeing right now (not because of the code here) that we need keep iterating:

sometimes we don't have step_id in a log. example. TODO: We need to add step_id to all the logs

for each task in a workflow run, we don't set the task_id in the SkyvernContext before the task starts. this means many logs won't have task_id - if we want to store logs according to the "task_id" field in the json. TODO: set task_id in SkyvernContext before a task starts and reset it after a task ends (completed, failed, terminated, canceled)

TODO: We need to add step_id to all the logs

@wintonzheng are we certain that each log must have a step_id?

From what I see in the logs when I ran tasks there are several logs in the beginning that don't belong to any step:

2024-12-09T11:09:09.220558Z [info ] Created new task organization_id=o_333794107196816612 proxy_location=RESIDENTIAL task_id=tsk_335262355392558810 url=https://www.nytimes.com/books/best-sellers 2024-12-09T11:09:09.220672Z [info ] Executing task using background task executor task_id=tsk_335262355392558810 2024-12-09T11:09:09.266772Z [info ] Creating browser state for task task_id=tsk_335262355392558810 2024-12-09T11:09:12.092991Z [info ] browser console log is saved log_path=./log/2024-12-09/95733df0-f2d8-428d-89ce-f71221f0cbe1.log 2024-12-09T11:09:12.326437Z [info ] Trying to navigate to https://www.nytimes.com/books/best-sellers and waiting for 5 seconds. retry_time=0 url=https://www.nytimes.com/books/best-sellers 2024-12-09T11:09:14.148604Z [info ] Page loading time loading_time=1.8220469951629639 url=https://www.nytimes.com/books/best-sellers 2024-12-09T11:09:19.148456Z [info ] Successfully went to https://www.nytimes.com/books/best-sellers retry_time=0 url=https://www.nytimes.com/books/best-sellers 2024-12-09T11:09:19.162015Z [info ] Starting agent step step_id=stp_335262355392558812 step_order=0 step_retry=0 task_id=tsk_335262355392558810

Here first 7 logs would be without a step because the first step is not created yet.

@wintonzheng do I undestand correctly that we want to create task and step ids in advance?

This way the logs that are currently not associated with a step or task because step/task not created yet would be associated with the first task and first step?

@wintonzheng are we certain that each log must have a step_id?

no, step_id needs to be manually defined in LOG.info("xxx", step_id="stp_xxxx") nowadays. those logs you shared here don't belong to any step so it's okay many logs don't have step_id.

@wintonzheng do I undestand correctly that we want to create task and step ids in advance? This way the logs that are currently not associated with a step or task because step/task not created yet would be associated with the first task and first step?

yep, exactly what i mean. we need to backfill step_id and task_id in some logs.

satansdeer · 2024-12-09T11:59:09Z

@suchintan here are logs in text format, I'm now generating both text and json artifacts

I'm only showing the text logs in the UI for now.

suchintan · 2024-12-09T15:47:06Z

@suchintan here are logs in text format, I'm now generating both text and json artifacts

I'm only showing the text logs in the UI for now.

This is perfect

suchintan · 2024-12-09T19:12:06Z

skyvern/forge/sdk/artifact/storage/base.py

@@ -10,6 +10,8 @@
    ArtifactType.SCREENSHOT_LLM: "png",
    ArtifactType.SCREENSHOT_ACTION: "png",
    ArtifactType.SCREENSHOT_FINAL: "png",
+    ArtifactType.SKYVERN_LOG: "log",


Step log..

Task log..

Workflow log...

i don't think adding STEP_LOG, TASK_LOG, WORKFLOW_LOG helps.

It's better to use step_id, task_id and workflow_run_id columns to filter artifacts. Also, step log and task log are not mutually exclusive. TaskLogs should be the super set of StepLogs. WorkflowRunLogs should be the super set of TaskLogs. We don't have to create multiple sets of log artifacts that have duplicated data.

I'm not sure if I agree with this -- why not have multiple artifact types that may or may not include duplicate data? It makes the implementation + writing of data + reading of data dead simple.

Why is it better to use step_id / task_id / workflow_run_id columns to filter artifacts w/ the same name + type?

If I understand correctly in this case we'll store workflow and task level logs in the step artifact folders

Current implementation of bulid uri always assumes step:

def build_uri(self, artifact_id: str, step: Step, artifact_type: ArtifactType) -> str: file_ext = FILE_EXTENTSION_MAP[artifact_type] return f"file://{self.artifact_path}/{step.task_id}/{step.order:02d}_{step.retry_index}_{step.step_id}/{datetime.utcnow().isoformat()}_{artifact_id}_{artifact_type}.{file_ext}"

We could define a build uri method for logs that would take entity type and id

So that the uri would look like this:

return f"file://{self.artifact_path}/{entity_type}_{entity_id}/{datetime.utcnow().isoformat()}_{artifact_type}.{file_ext}"

This way we would be able to fetch logs on the frontend using a similar query as for the other artifacts

We would be able to use artifact type for all the levels

@suchintan no concern of data duplication. you're wright, having more artifact_types makes the implementation easier.

…Thought (Skyvern-AI#1298)

suchintan · 2024-12-16T06:26:24Z

skyvern-frontend/src/routes/tasks/detail/ScrollableActionList.tsx

@@ -65,11 +63,11 @@ function ScrollableActionList({
          onClick={() => onActiveIndexChange(i)}
          onMouseEnter={() => {
            queryClient.prefetchQuery({
-              queryKey: ["task", taskId, "steps", action.stepId, "artifacts"],
+              queryKey: ["step", action.stepId, "artifacts"],


@msalihaltun can you review?

suchintan · 2024-12-16T06:28:11Z

skyvern/forge/sdk/log_artifacts.py

+LOG = structlog.get_logger()
+
+def primary_key_from_log_entity_type(log_entity_type: LogEntityType) -> str:
+    if log_entity_type == LogEntityType.STEP:


This is great

skyvern/exceptions.py

skyvern/forge/agent.py

skyvern/forge/sdk/log_artifacts.py

wintonzheng

Made a couple of small comments. Code looks great.

wintonzheng · 2024-12-16T10:07:18Z

skyvern/forge/sdk/artifact/manager.py

 from skyvern.forge.sdk.db.id import generate_artifact_id
 from skyvern.forge.sdk.models import Step
 from skyvern.forge.sdk.schemas.observers import ObserverCruise, ObserverThought

 LOG = structlog.get_logger(__name__)

+PRIMARY_KEY = Literal[


@LawyZheng we're introducing new primary keys for artifacts

Now i think we need to ensure those aio tasks for new primary keys (workflow_run_id, step_id) will be awaited at cleanup time.

wintonzheng · 2024-12-17T04:02:10Z

typing fix here: 63829fa

feel free to pick that commit and we can merge this PR

fix

satansdeer added 6 commits December 6, 2024 10:44

Record log entries into context

79e5763

Add log field to context

c3cbf4b

Add SKYVERN_LOG artifact type

f3debf5

Add skyvern log tab on the frontend

c1e7ff6

Add SkyvernLogEncoder to handle non-serializable objects

b1f14c6

Record logs to an artifact

b2cb9c4

satansdeer changed the title ~~Maksimi/record logs~~ Record logs into step artifacts Dec 6, 2024

ellipsis-dev bot reviewed Dec 6, 2024

View reviewed changes

satansdeer added 2 commits December 6, 2024 11:55

Record only logs corresponding to a step

eb6fc3d

Recover empty line

57cd772

suchintan reviewed Dec 6, 2024

View reviewed changes

wintonzheng reviewed Dec 7, 2024

View reviewed changes

Add text log encoder

419e12b

suchintan reviewed Dec 9, 2024

View reviewed changes

wintonzheng and others added 13 commits December 12, 2024 12:43

re-enable upload block (Skyvern-AI#1324)

ddbc340

remove no latest screenshot error log (Skyvern-AI#1325)

f313326

Put a guard in workflow save error detail (Skyvern-AI#1326)

9b35651

urlencode download suffix (Skyvern-AI#1327)

2d90fed

wait for downloads to be done (Skyvern-AI#1328)

be0e817

Skyvern Forms UI (Skyvern-AI#1330)

5ee8c3b

Fix a navigation bug with saved tasks (Skyvern-AI#1331)

f8a6b58

workflow run block (Skyvern-AI#1332)

8bf863c

forloop metadata variables (Skyvern-AI#1334)

5d32d53

auto prepend scheme to url (Skyvern-AI#1335)

3c37a4a

rename GEMINI_FLUSH->GEMINI_FLASH (Skyvern-AI#1333)

1aa1146

bump navigation max retry to 5 (Skyvern-AI#1336)

f3bb2fe

add workflow_run_id column to artifacts + ObserverCruise and Observer…

6606fba

…Thought (Skyvern-AI#1298)

satansdeer added 9 commits December 13, 2024 14:28

Define get_artifact_by_entity_id

d48f792

Reuse log artifacts when saving

f57c7b6

Introduce get artifacts by entity id handler

c8274c3

Get step artifacts using entity id handler

3ff49cb

Remove logs

c3ce7d3

Record workflow run logs

72b3f15

Save task logs

aac781f

Merge branch 'main' into maksimi/record-logs

ff86b52

Remove print

a86d81d

satansdeer requested review from wintonzheng and suchintan December 13, 2024 16:23

suchintan reviewed Dec 16, 2024

View reviewed changes

suchintan approved these changes Dec 16, 2024

View reviewed changes

wintonzheng reviewed Dec 16, 2024

View reviewed changes

skyvern/exceptions.py Outdated Show resolved Hide resolved

wintonzheng reviewed Dec 16, 2024

View reviewed changes

skyvern/forge/agent.py Outdated Show resolved Hide resolved

wintonzheng reviewed Dec 16, 2024

View reviewed changes

skyvern/forge/sdk/log_artifacts.py Outdated Show resolved Hide resolved

wintonzheng approved these changes Dec 16, 2024

View reviewed changes

wintonzheng reviewed Dec 16, 2024

View reviewed changes

satansdeer added 4 commits December 16, 2024 14:08

Merge main

0f0633f

Revert change to InvalidUrl type

727dd7e

Add complete_action.action_order back

2312281

Add with_skyvern_context decorator

d66e67b

wintonzheng approved these changes Dec 16, 2024

View reviewed changes

Merge branch 'main' into shu/maksimi/record-logs-new

2cf5f69

wintonzheng and others added 3 commits December 16, 2024 20:08

address typing

63829fa

fix

Add ENABLE_LOG_ARTIFACTS env variable; Disable log artifacts by default

9eda7ca

Pre-commit check

631f898

wintonzheng merged commit b8e2527 into Skyvern-AI:main Dec 17, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record logs into step artifacts #1339

Record logs into step artifacts #1339

satansdeer commented Dec 6, 2024 •

edited

Loading

ellipsis-dev bot left a comment

suchintan commented Dec 6, 2024

suchintan Dec 6, 2024

wintonzheng Dec 6, 2024

wintonzheng Dec 7, 2024

wintonzheng Dec 7, 2024

satansdeer Dec 9, 2024

satansdeer Dec 9, 2024

wintonzheng Dec 9, 2024

satansdeer commented Dec 9, 2024

suchintan commented Dec 9, 2024

suchintan Dec 9, 2024

wintonzheng Dec 10, 2024 •

edited

Loading

suchintan Dec 11, 2024 •

edited

Loading

satansdeer Dec 11, 2024

wintonzheng Dec 11, 2024

suchintan Dec 16, 2024

suchintan Dec 16, 2024

wintonzheng left a comment

wintonzheng Dec 16, 2024

wintonzheng commented Dec 17, 2024 •

edited

Loading

Record logs into step artifacts #1339

Record logs into step artifacts #1339

Conversation

satansdeer commented Dec 6, 2024 • edited Loading

Assumptions

Log Artifacts API Access

Storing Log Artifact Files

Testing plan

Screenshots

Updates

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

suchintan commented Dec 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

satansdeer commented Dec 9, 2024

suchintan commented Dec 9, 2024

Choose a reason for hiding this comment

wintonzheng Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

suchintan Dec 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wintonzheng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wintonzheng commented Dec 17, 2024 • edited Loading

satansdeer commented Dec 6, 2024 •

edited

Loading

wintonzheng Dec 10, 2024 •

edited

Loading

suchintan Dec 11, 2024 •

edited

Loading

wintonzheng commented Dec 17, 2024 •

edited

Loading