What's the proper way to get output after running a command with execute_command? #3452

vitaliy-zvon · 2022-10-12T20:56:42Z

vitaliy-zvon
Oct 12, 2022

At work, we have a docker container running on an ECS Fargate worker.
The container is running the Django Python web framework.
I'm trying to grab a backup of the database, and copy it to my local machine.

I have a command I can run that will either save the database to a file, or send it, in binary format, to stdout.
I haven't found a way to copy a file to the local machine, so I've been using execute_command to run the command that outputs the database to stdout, and trying to save that locally to a file.

Once you run execute_command, you get the streamUrl, and some other info.

I'm not sure if there's another way to get the output of my command, which is hundreds of megabytes in size (it's a large database), but I've been following this general approach to get the binary data to my local machine.

My code looks sort of like this (although this is the simplified version):

import json
import uuid
import boto3
import construct as c
import websocket

Message = c.Struct(
    "HeaderLength" / c.Int32ub,
    "MessageType" / c.PaddedString(32, "ascii"),
    "SchemaVersion" / c.Int32ub,
    "CreateDate" / c.Int64ub,
    "SequenceNumber" / c.Int64sb,
    "Flags" / c.Int64ub,
    "MessageID" / c.Array(16, c.Byte),
    "PayloadDigest" /  c.Array(32, c.Byte),
    "PayloadType" / c.Int32ub,
    "PayloadLength" / c.Int32ub,
    "Payload" / c.Array(c.this.PayloadLength, c.Byte),
)

init_payload = {
    "MessageSchemaVersion": "1.0",
    "RequestId": str(uuid.uuid4()),
    "TokenValue": session["tokenValue"],
}

# The actual command we want to run
command = 'python -c "for i in range(30_000_000):\n    print(i)"'

# A marker that signifies the end of our message
end_marker = "-- END --"
end_marker_command = f'echo -n "{end_marker}"'

result = client.execute_command(
    cluster=CLUSTER,
    task=TASK,
    container=CONTAINER,
    command=f"bash -c '{command}; {end_marker_command}'",
    interactive=True,
)

session = result["session"]
connection = websocket.create_connection(session["streamUrl"])

with open("out", "wb") as f:
    try:
        connection.send(json.dumps(init_payload))

        break_loop = False
        while True:
            response = connection.recv()

            message = Message.parse(response)

            # Decode message
            payload = int_list_to_bytes(message.Payload)

            # Check for end marker
            if payload[-len(end_marker) :] == end_marker.encode("utf-8"):
                payload = payload[: -len(end_marker)]
                break_loop = True

            if "channel_closed" in message.MessageType:
                raise Exception("Channel closed before command output was received")

            print("="*80)
            print(f"HeaderLength: {message.HeaderLength}")
            print(f"MessageType: {message.MessageType}")
            print(f"SchemaVersion: {message.SchemaVersion}")
            print(f"CreateDate: {message.CreateDate}")
            print(f"SequenceNumber: {message.SequenceNumber}")
            print(f"Flags: {message.Flags}")
            print(f"MessageID: {int_list_to_bytes(message.MessageID)}")
            print(f"PayloadDigest: {int_list_to_bytes(message.PayloadDigest)}")
            print(f"PayloadType: {message.PayloadType}")
            print(f"PayloadLength: {message.PayloadLength}")
            print(f"Payload: {payload}")

            f.write(payload)

            if break_loop:
                break

    finally:
        connection.close()

It works for a while (the first few million lines of output), but eventually, the websocket starts missing sequence numbers.
Packets from sequence number 0 to 11999 are all sequential. Then all of a sudden, we get missing packets. After 11999 we get 12012, 12176, 12340, 12503, and so on. It doesn't go back to being properly incrementing until sequence 47341. That's a lot of missing data.

I've been trying to get this to work for several days. How can I either copy a file to my local system (without going through S3), or get hundreds of megabytes of binary output of a command?

I'm guessing there's 2 solutions - to somehow use sessions to get the output, or to read directly from the websocket like I'm doing here. Help with either one is fine with me.

PS. I'm very new to AWS APIs. It took me a while to figure out tasks, clusters, regions, and containers. I'm still not entirely sure how sessions work. If there's a simple, but fully fledged example of how to run a command, and read its large, binary output, including starting a session (or is one started automatically when you login? When you run execute_command?), I would appreciate it.

PPS. I'm aware that my current approach can fail if the end_marker is split up between 2 different messages. If there's a simpler way to know when to stop reading, I'd love to hear it. Otherwise, I can address the bug when I'm no longer missing packets from the stream.

Edit: Is it possible this is some sort of a buffer overflow issue? We're sending a relatively large amount of data. If something is getting buffered somewhere, and the buffer is overflowing, that might explain the lost packets. Interestingly, adding a time.sleep(0.00001) on every iteration of the loop to slow down the generation of data didn't help at all. It still sent exactly the same number of packets in order (11999), but the next packet this time had the sequence number 12075. 11999 is about 11 MB in, until it starts to lose packets, if that means anything.

tim-finnigan · 2022-10-27T22:20:58Z

tim-finnigan
Oct 27, 2022
Collaborator

Hi @vitaliy-zvon thanks for reaching out. I'm not that familiar with this use case but found another Stack Overflow posts which may help: https://stackoverflow.com/questions/68569452/boto3-execute-command-inside-python-script

Have you tried any of the approaches suggested there? Please let us know what else you've attempted and whether it has worked or not.

For more information on Sessions you can refer to this documentation - although I couldn't find a fully fledged documented example involving execute_command.

2 replies

vitaliy-zvon Oct 27, 2022
Author

Hi Tim.

Thanks for your response.

I'm afraid if I write out everything I've tried, it's going to turn into a 10 page post, and no one will bother reading it.
I've been at this for probably close to a month now.
I've tried extracting the output of my command using ssm. It didn't work.
I've tried it using raw websockets, as above. I'm getting missing data.
I've tried it using subprocess, and calling the aws binary. It sort of works, but the aws binary returns a bunch of extra lines before and after my binary data that I need to filter out, which is a pain.

There must be an easier way to do this. I don't use AWS a lot - is this really not a common use case? Running a command, and getting the output?

I've come across that documentation before, and that exact SO post. I tried everything that looked like it might work. Nothing's worked so far.

vitaliy-zvon Nov 13, 2022
Author

I've been at this for over a month.
I just need to run a command on an AWS container, and gets its output.
It works with aws-cli using: aws --region ... ecs execute-command --cluster ... ---task ... --container ... --command ... --interactive. I just need to do this in Python without having to resort to using subprocess to run aws-cli

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the proper way to get output after running a command with execute_command? #3452

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

What's the proper way to get output after running a command with execute_command? #3452

vitaliy-zvon Oct 12, 2022

Replies: 1 comment · 2 replies

tim-finnigan Oct 27, 2022 Collaborator

vitaliy-zvon Oct 27, 2022 Author

vitaliy-zvon Nov 13, 2022 Author

vitaliy-zvon
Oct 12, 2022

Replies: 1 comment 2 replies

tim-finnigan
Oct 27, 2022
Collaborator

vitaliy-zvon Oct 27, 2022
Author

vitaliy-zvon Nov 13, 2022
Author