CLI Codegen design #606

josephjclark · 2024-02-20T12:47:07Z

This issue contains a living design for openfn codegen (or whatever we call it), the CLI command which today will generate an adaptor template, but tomorrow may generate other stuff.

Inputs

The command will need to take the following inputs:

openAPI Spec
The endpoints you want to generate mappings for
The folder to output the various generated assets to (actually is this just the monorepo root and adaptor name?)
(optional) the model you want to use (name? url?)
(optional) an API key for the model

Outputs

The existing AI service will generate the following files:

adaptor.d.ts
adaptor.js
adaptor.test.js

We need to extend it to generate a full adaptor template:

src/adaptor.d.ts
src/adaptor.js
src/index.js
test/adaptor.test.js
readme.md
package.json

What if you point to an existing adaptor? Can we merge the new code with the existing Adaptor.js?

API Spec

Here's what the API might look like:

openfn generate adaptor spec.json

spec.json would contain all the inputs. Presumably they can all be overridden by CLI Args too.

Right now this is:

spec: <openAPI spec as JSON>,
instruction: "Create an OpenFn function that allows me to create new trackedEntityInstances via the /trackedEntityInstances endpoint"
model?: "gtp-4"

But I would probably like to change "instruction" to "endpoints", and just have a list

Changing vs Generating

It's easy to generate a fresh adaptor because we're outputting fresh source code. No toes to step on.

But if this is going to work on an existing adaptor, we have to generate code inside existing code.

Either we do that by injecting generated code into an existing file, perhaps using static analysis tools, or we get the AI to do it (which feels like a risk to me).

I suppose we start by saying: here is the existing adaptor.js file. Please generate or amend this function. If that proves reliable then great, we can keep it. Bear in mind that everything is backed by git, so it's not like the codegen can make permanent breaking changes - also changes are easy to see via diff.

Model Abstraction

Should we expose the model to the user?

I think we're building a highly specific service here. I think we should chose a model that works well for the task, maybe fine tune it (I think we had mixed results on that?) and optimise the prompt for that model.

Maybe we even find that we want to use multiple models in the pipeline.

So I kind of thing we should abstract the model, maybe expose it as an option if it helps, but otherwise don't put it it in the user's hands. This also implies that we own the model's API keys (if required). Maybe users need an API key to use the service generally?

Python Server

The current AI spike is implemented as a python service behind a python http server.

We could re-implement the "frontend" of the service in TypeScript. But ML devs don't want to use TypeScript, so this isn't a very appealing option.

So we probably need to keep the python server. That means we need to host a central server which exposes a bunch of endpoints. Each endpoint should be like api/<task>/<version>, ie, api/adaptor/1. Input payloads could be quite big if they have to include spec.json files or a prompt, so the endpoint should accept a JSON payload.

The CLI could take a url to the server to call. By default it would call data.openfn.org or something, but you can pass localhost:1234 to use a local dev server.

The text was updated successfully, but these errors were encountered:

josephjclark · 2024-02-20T14:50:53Z

This python server business is interesting.

We are building a little list of data services - like metadata, docgen, and now codegen.

We've talked about hosting some of these in Lightning but I'm not entirely sure they're a lightning concern.

An obvious difficulty is that the we would want a node or elixir server to serve metadata and docgen, but the AI community would likely want a python server to service any codegen calls.

One benefit of hosting it on http of course is that we can map different endpoints to different servers.

I might add a command to the CLI which is a bit more dev focused and which calls out to the data server. So we'll have openfn generate adaptor ./spec.json but maybe we'll also have openfn call gen/adaptor ./spec.json, where call is a low-level mapping to an arbitrary endpoint on the web server. This makes it for a dev to add a novel function to the service and test it from the CLI.

christad92 · 2024-03-27T10:00:38Z

@josephjclark what is blocking this issue? Is this related to OpenFn/adaptors#19?

josephjclark · 2024-03-27T12:01:10Z

@christad92 no - this issue is for AI-driven template generation). It's not blocked at the moment, I'll be picking it up next week.

The other ticket is a simple static template generation utility. Useful but not particularly interesting.

josephjclark · 2024-04-25T14:13:25Z

Notes on the CLI:

There will be two CLI commands - a specific generate template and a general apollo.

The apollo one will call out to general server-side services - so you can still trigger template generation by going to the endpoint directly.

The apollo command should do some useful stuff, like:

resolving paths in a local JSON file (we should detect paths like "data": "./spec.json"), so that it's easy to work with large json files
loading env vars (strings of the form $OPENAI_API_KEY
Load a js file that exports JSON (or a promise that returns JSON)
write outputs to files on disk (a good reason for the resulting JSON to have a convention like files: { a.js: '', b.md: '' })

josephjclark · 2024-05-28T10:45:28Z

Most of this work is done.

Instead of a codegen command, I'm so far only supporting a generic apollo command. I might optimise for codegen later with a slicker API, but it's not needed right now.

taylordowns2000 added this to v2 Feb 20, 2024

github-project-automation bot moved this to New Issues in v2 Feb 20, 2024

josephjclark mentioned this issue Feb 20, 2024

[EPIC] CLI Codegen #607

Closed

10 tasks

josephjclark changed the title ~~[WIP] CLI Codegen design~~ CLI Codegen design Feb 21, 2024

josephjclark moved this from New Issues to In progress in v2 Feb 21, 2024

josephjclark mentioned this issue Feb 21, 2024

CLI: Adaptor codegen #609

Closed

6 tasks

taylordowns2000 assigned josephjclark Feb 22, 2024

josephjclark moved this from In progress to Blocked in v2 Feb 22, 2024

christad92 moved this from Blocked to Backlog in v2 Mar 27, 2024

josephjclark moved this from Backlog to Ready in v2 Apr 3, 2024

christad92 moved this from Ready to Backlog in v2 Apr 4, 2024

josephjclark mentioned this issue Apr 26, 2024

New Server OpenFn/apollo#44

Merged

26 tasks

josephjclark closed this as completed May 28, 2024

github-project-automation bot moved this from Backlog to Done in v2 May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Codegen design #606

CLI Codegen design #606

josephjclark commented Feb 20, 2024 •

edited

Loading

josephjclark commented Feb 20, 2024

christad92 commented Mar 27, 2024

josephjclark commented Mar 27, 2024

josephjclark commented Apr 25, 2024

josephjclark commented May 28, 2024

CLI Codegen design #606

CLI Codegen design #606

Comments

josephjclark commented Feb 20, 2024 • edited Loading

Inputs

Outputs

API Spec

Changing vs Generating

Model Abstraction

Python Server

josephjclark commented Feb 20, 2024

christad92 commented Mar 27, 2024

josephjclark commented Mar 27, 2024

josephjclark commented Apr 25, 2024

josephjclark commented May 28, 2024

josephjclark commented Feb 20, 2024 •

edited

Loading