Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI Codegen design #606

Closed
josephjclark opened this issue Feb 20, 2024 · 5 comments
Closed

CLI Codegen design #606

josephjclark opened this issue Feb 20, 2024 · 5 comments
Assignees

Comments

@josephjclark
Copy link
Collaborator

josephjclark commented Feb 20, 2024

This issue contains a living design for openfn codegen (or whatever we call it), the CLI command which today will generate an adaptor template, but tomorrow may generate other stuff.

Inputs

The command will need to take the following inputs:

  • openAPI Spec
  • The endpoints you want to generate mappings for
  • The folder to output the various generated assets to (actually is this just the monorepo root and adaptor name?)
  • (optional) the model you want to use (name? url?)
  • (optional) an API key for the model

Outputs

The existing AI service will generate the following files:

  • adaptor.d.ts
  • adaptor.js
  • adaptor.test.js

We need to extend it to generate a full adaptor template:

  • src/adaptor.d.ts
  • src/adaptor.js
  • src/index.js
  • test/adaptor.test.js
  • readme.md
  • package.json

What if you point to an existing adaptor? Can we merge the new code with the existing Adaptor.js?

API Spec

Here's what the API might look like:

openfn generate adaptor spec.json

spec.json would contain all the inputs. Presumably they can all be overridden by CLI Args too.

Right now this is:

spec: <openAPI spec as JSON>,
instruction: "Create an OpenFn function that allows me to create new trackedEntityInstances via the /trackedEntityInstances endpoint"
model?: "gtp-4"

But I would probably like to change "instruction" to "endpoints", and just have a list

Changing vs Generating

It's easy to generate a fresh adaptor because we're outputting fresh source code. No toes to step on.

But if this is going to work on an existing adaptor, we have to generate code inside existing code.

Either we do that by injecting generated code into an existing file, perhaps using static analysis tools, or we get the AI to do it (which feels like a risk to me).

I suppose we start by saying: here is the existing adaptor.js file. Please generate or amend this function. If that proves reliable then great, we can keep it. Bear in mind that everything is backed by git, so it's not like the codegen can make permanent breaking changes - also changes are easy to see via diff.

Model Abstraction

Should we expose the model to the user?

I think we're building a highly specific service here. I think we should chose a model that works well for the task, maybe fine tune it (I think we had mixed results on that?) and optimise the prompt for that model.

Maybe we even find that we want to use multiple models in the pipeline.

So I kind of thing we should abstract the model, maybe expose it as an option if it helps, but otherwise don't put it it in the user's hands. This also implies that we own the model's API keys (if required). Maybe users need an API key to use the service generally?

Python Server

The current AI spike is implemented as a python service behind a python http server.

We could re-implement the "frontend" of the service in TypeScript. But ML devs don't want to use TypeScript, so this isn't a very appealing option.

So we probably need to keep the python server. That means we need to host a central server which exposes a bunch of endpoints. Each endpoint should be like api/<task>/<version>, ie, api/adaptor/1. Input payloads could be quite big if they have to include spec.json files or a prompt, so the endpoint should accept a JSON payload.

The CLI could take a url to the server to call. By default it would call data.openfn.org or something, but you can pass localhost:1234 to use a local dev server.

@github-project-automation github-project-automation bot moved this to New Issues in v2 Feb 20, 2024
@josephjclark josephjclark mentioned this issue Feb 20, 2024
10 tasks
@josephjclark
Copy link
Collaborator Author

This python server business is interesting.

We are building a little list of data services - like metadata, docgen, and now codegen.

We've talked about hosting some of these in Lightning but I'm not entirely sure they're a lightning concern.

An obvious difficulty is that the we would want a node or elixir server to serve metadata and docgen, but the AI community would likely want a python server to service any codegen calls.

One benefit of hosting it on http of course is that we can map different endpoints to different servers.

I might add a command to the CLI which is a bit more dev focused and which calls out to the data server. So we'll have openfn generate adaptor ./spec.json but maybe we'll also have openfn call gen/adaptor ./spec.json, where call is a low-level mapping to an arbitrary endpoint on the web server. This makes it for a dev to add a novel function to the service and test it from the CLI.

@josephjclark josephjclark changed the title [WIP] CLI Codegen design CLI Codegen design Feb 21, 2024
@josephjclark josephjclark moved this from New Issues to In progress in v2 Feb 21, 2024
@josephjclark josephjclark mentioned this issue Feb 21, 2024
6 tasks
@josephjclark josephjclark moved this from In progress to Blocked in v2 Feb 22, 2024
@christad92
Copy link

@josephjclark what is blocking this issue? Is this related to OpenFn/adaptors#19?

@josephjclark
Copy link
Collaborator Author

@christad92 no - this issue is for AI-driven template generation). It's not blocked at the moment, I'll be picking it up next week.

The other ticket is a simple static template generation utility. Useful but not particularly interesting.

@christad92 christad92 moved this from Blocked to Backlog in v2 Mar 27, 2024
@josephjclark josephjclark moved this from Backlog to Ready in v2 Apr 3, 2024
@christad92 christad92 moved this from Ready to Backlog in v2 Apr 4, 2024
@josephjclark
Copy link
Collaborator Author

Notes on the CLI:

There will be two CLI commands - a specific generate template and a general apollo.

The apollo one will call out to general server-side services - so you can still trigger template generation by going to the endpoint directly.

The apollo command should do some useful stuff, like:

  • resolving paths in a local JSON file (we should detect paths like "data": "./spec.json"), so that it's easy to work with large json files
  • loading env vars (strings of the form $OPENAI_API_KEY
  • Load a js file that exports JSON (or a promise that returns JSON)
  • write outputs to files on disk (a good reason for the resulting JSON to have a convention like files: { a.js: '', b.md: '' })

@josephjclark josephjclark mentioned this issue Apr 26, 2024
26 tasks
@josephjclark
Copy link
Collaborator Author

Most of this work is done.

Instead of a codegen command, I'm so far only supporting a generic apollo command. I might optimise for codegen later with a slicker API, but it's not needed right now.

@github-project-automation github-project-automation bot moved this from Backlog to Done in v2 May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants