Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Investigate Claude #16

Open
danieljharris opened this issue Sep 19, 2023 · 2 comments
Open

[Proposal] Investigate Claude #16

danieljharris opened this issue Sep 19, 2023 · 2 comments

Comments

@danieljharris
Copy link

It might be worth investigating https://claude.ai/ for code migration. Because it can take in large prompts (around 75,000 words) it could potentially take in all of Unity's documentation and the documentation of the engine to move to, and then make a better informed code transition.

I started looking into this myself but I've so far been unable to find a downloadable text/PDF version of Unity's full documentation. If anyone has any ideas on how to get this let me know.

@Blade67
Copy link

Blade67 commented Sep 19, 2023

Might be worth considering scraping Unity's docs and mapping classes and functions to other engine equivalents.
Using a mapping would also allow to stay with ChatGPT/OpenAI although Claude.AI's pricing model is slightly cheaper on larger data sets.

@danieljharris
Copy link
Author

danieljharris commented Sep 20, 2023

I found a downloadable offline version of Unity's documentation that should do the trick for the Unity side. There are 2,046 files but I think that can be filtered down to only the "class-" ones which is only 205. These would need to be combined into a single file to be able to be used by Claude. I have this powershell script which does this, however it is unfortunately over the text limit, even with the html and link elements removed, so maybe this might not be a good approach to code migration:

# Define the directory where the .html files are stored
$sourceDirectory = ".\"

# Define the output .txt file
$outputFile = ".\FilteredHtmlContents.txt"

# Remove the existing output file if it exists
if (Test-Path $outputFile) {
    Remove-Item $outputFile
}

# Regex pattern to match lines that start with <p> and contain text or a heading
$regexPattern = "^<p>.*(<h[1-6]>.*<\/h[1-6]>|[^<]+).*$"

# Loop through each .html file in the directory
Get-ChildItem -Path $sourceDirectory -Filter *.html | ForEach-Object {
    # Read the content of the .html file
    $content = Get-Content $_.FullName
    
    # Filter the lines based on the regex pattern
    $filteredContent = $content | Select-String -Pattern $regexPattern
    
    # Write the filtered content to the output .txt file, stripping HTML tags and ignoring specific patterns
    if ($filteredContent) {
        $filteredContent | ForEach-Object {
            $line = $_.Line

            # Ignore lines that only contain a link or an image
            if ($line -match "<p><a [^>]+><\/a><\/p>" -or $line -match "<p><img [^>]+><\/p>") {
                return
            }

            # Remove 'a href' sections
            $line = $line -replace '<a href="[^"]+">(.*?)<\/a>', "`$1"

            # Remove all HTML tags
            $line = $line -replace "<.*?>", ""

            Add-Content -Path $outputFile -Value $line
        }
    }
}

Write-Host "Filtered lines from .html files have been written into $outputFile"

According to this post, for Unreal the entire documentation is already downloaded when you install Unreal at C:\Program Files\Epic Games\UE_5.1\Engine\Documentation\Builds. I'm guessing that will also need to be combined into a single file.

@bshikin bshikin changed the title Investigate Claude [Proposal] Investigate Claude Sep 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants