Skip to content

Commit

Permalink
Update HTML parsing APIs and script references, introduce docify scri…
Browse files Browse the repository at this point in the history
…pt, and remove unnecessary cache flags
  • Loading branch information
pelikhan committed Aug 30, 2024
1 parent 7c39bb1 commit 4a1b76f
Show file tree
Hide file tree
Showing 7 changed files with 102 additions and 18 deletions.
43 changes: 43 additions & 0 deletions docs/src/content/docs/reference/scripts/html.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: HTML
description: Learn how to use HTML parsing functions in GenAIScript for effective content manipulation and data extraction.
keywords: HTML parsing, content manipulation, data extraction, HTML to text, HTML to markdown
sidebar:
order: 18
---

# HTML in GenAIScript

HTML processing in GenAIScript enables you to manipulate and extract data from HTML content effectively. Below you can find guidelines on using the HTML-related APIs available in GenAIScript.

## Overview

HTML processing functions allow you to convert HTML content to text or markdown, helping in content extraction and manipulation for various automation tasks.

## API Reference

### `HTMLToText`

Converts HTML content into plain text. This is useful for extracting readable text from web pages.

#### Example

```js
const htmlContent = "<p>Hello, world!</p>";
const text = HTML.HTMLToText(htmlContent);
// Output will be: "Hello, world!"
```

### `HTMLToMarkdown`

Converts HTML into Markdown format. This function is handy for content migration projects or when integrating web content into markdown-based systems.

#### Example

```js
const htmlContent = "<p>Hello, <strong>world</strong>!</p>";
const markdown = HTML.HTMLToMarkdown(htmlContent);
// Output will be: "Hello, **world**!"
```

For more details on related APIs, refer to the [GenAIScript documentation](https://microsoft.github.io/genaiscript/).
4 changes: 2 additions & 2 deletions genaisrc/blog-generator.genai.mjs
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
script({
description: "Generate a blog post for Dev.to from the documentation",
model: "openai:gpt-4o",
model: "openai:gpt-4-turbo",
system: [],
tools: ["fs", "md"],
parameters: {
Expand Down Expand Up @@ -96,7 +96,7 @@ let snippet
Use these files to help you generate a topic for the blog post.
- the code will be executed in node.js v20 by the GenAIScript CLI
- the genaiscript type definition: genaiscript/genaiscript.d.ts. Assume that all globals are ambient. Do not import or require genaiscript module.
- the genaiscript type definition: genaisrc/genaiscript.d.ts. Assume that all globals are ambient. Do not import or require genaiscript module.
- the genaiscript samples: packages/sample/src/*.genai.*
- the documentation is in markdown and has frontmatter: docs/src/content/docs/**/*.md*
- the online documentation: https://microsoft.github.io/genaiscript/
Expand Down
11 changes: 6 additions & 5 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@
"test:core": "cd packages/core && yarn test",
"test:samples": "cd packages/sample && yarn test",
"test:cli": "node packages/cli/built/genaiscript.cjs run code-annotator packages/sample/src/counting.py -l Test -o .genaiscript/tmp/tests/cli -ot .genaiscript/tmp/tests/cli/outtrace.md -oa .genaiscript/tmp/tests/cli/diags.sarif",
"test:live": "node packages/cli/built/genaiscript.cjs run code-annotator packages/sample/src/counting.py -l Test -o .genaiscript/tmp/tests/cli --retry 1 --temperature 0.5 --no-cache",
"test:front-matter": "node packages/cli/built/genaiscript.cjs run front-matter SUPPORT.md --no-cache",
"test:summarize": "node packages/cli/built/genaiscript.cjs run summarize packages/sample/src/rag/markdown.md --no-cache",
"test:live": "node packages/cli/built/genaiscript.cjs run code-annotator packages/sample/src/counting.py -l Test -o .genaiscript/tmp/tests/cli --retry 1 --temperature 0.5 ",
"test:front-matter": "node packages/cli/built/genaiscript.cjs run front-matter SUPPORT.md ",
"test:summarize": "node packages/cli/built/genaiscript.cjs run summarize packages/sample/src/rag/markdown.md ",
"test:pdf": "node packages/cli/built/genaiscript.cjs parse pdf packages/sample/src/rag/loremipsum.pdf",
"test:docx": "node packages/cli/built/genaiscript.cjs parse docx packages/sample/src/rag/Document.docx",
"retrieval:index": "node packages/cli/built/genaiscript.cjs retrieval index \"packages/sample/src/rag/*\"",
Expand Down Expand Up @@ -63,8 +63,9 @@
"genai:test": "node packages/cli/built/genaiscript.cjs run test-gen",
"genai:blog-post": "node packages/cli/built/genaiscript.cjs run blog-generator",
"genai:readme": "node packages/cli/built/genaiscript.cjs run readme-updater",
"genai:blogify": "node packages/cli/built/genaiscript.cjs run blogify-sample --no-cache",
"genai:tweetify": "node packages/cli/built/genaiscript.cjs run tweetify --no-cache",
"genai:blogify": "node packages/cli/built/genaiscript.cjs run blogify-sample",
"genai:tweetify": "node packages/cli/built/genaiscript.cjs run tweetify",
"genai:docify": "node packages/cli/built/genaiscript.cjs run docify",
"gcm": "node packages/cli/built/genaiscript.cjs run gcm"
},
"release-it": {
Expand Down
2 changes: 1 addition & 1 deletion packages/core/src/genaisrc/system.fs_read_file.genai.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ system({

defTool(
"fs_read_file",
"Reads a file as text from the file system.",
"Reads a file as text from the file system. Returns undefined if the file does not exist.",
{
type: "object",
properties: {
Expand Down
30 changes: 30 additions & 0 deletions packages/sample/genaisrc/docify.genai.mts
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
script({
model: "openai:gpt-4-turbo",
tools: ["fs", "md"],
})

const api = env.vars.api + ""

$`You are an expert technical writer for the GenAIScript language.
## Task
Generate a documentation page about the ${api}.
Save to file in the docs/src/content/docs/reference/scripts folder.
## Information
- use markdown, with Astro Starlight syntax
- the genaiscript type definition: genaisrc/genaiscript.d.ts. Assume that all globals are ambient. Do not import or require genaiscript module.
- the documentation is in markdown and has frontmatter: docs/src/content/docs/**/*.md*
- the online documentation: https://microsoft.github.io/genaiscript/
- the genaiscript samples: packages/sample/src/*.genai.*
- document each api separately with a short example
- use "js" language for genai code blocks
- link to online documentation for related apis
- use const keyword for all variables if possible
- do not add console.log to snippets
- minimize changes to existing documentation
`

defFileOutput("docs/src/content/docs/reference/scripts/*.md", "Documentation pages")
10 changes: 0 additions & 10 deletions packages/sample/genaisrc/html-to-text.genai.js

This file was deleted.

20 changes: 20 additions & 0 deletions packages/sample/genaisrc/html.genai.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
script({
model: "openai:gpt-3.5-turbo",
title: "HTML to Text",
tests: {},
})

const { text: html } = await fetchText(
"https://microsoft.github.io/genaiscript/getting-started/"
)
const text = HTML.convertToText(html)
def("TEXT", text)

const md = HTML.convertToMarkdown(html)
const v = def("MARKDOWN", md)

const tables = HTML.convertTablesToJSON(html)
defData("TABLES", tables)

$`Compare TEXT and MARKDOWN.
Analyze the TABLES data.`

0 comments on commit 4a1b76f

Please sign in to comment.