Releases: chigkim/VOCR
VOCR v2.1.0
VOCR v2.0.1
Changelog
v2.0.1
- VOCR no longer crashes when VoiceOver is not running. Instead, it speaks with system speech synthesizer.
v2.0.0
- Ask AI (OpenAI GPT-4, Ollama)
- Explore with AI (OpenAI GPT-4 Only)
- Capture image with a camera and ask AI: Command+Shift+Control+C
- Supports FaceTime, external, and iPhone cameras (with Continuity Camera feature)
- Open image files from Finder
- VOCR Menu: Command+Shift+Control+S
- Real-time OCR: Command+Shift+Control+R
- Object detection for icons
- Auto Scan
- Customize shortcuts
- Auto updates
- Save last image
- Save OCR results
- Faster VOCursor scanning
- Target window feature
- Disable mouse movement
- Launch on login
- Startup sound
- Logger
VOCR v2.0.0
Changelog
- Ask AI (OpenAI GPT-4, Ollama)
- Explore with AI (OpenAI GPT-4 Only)
- Capture image with a camera and ask AI: Command+Shift+Control+C
- Supports FaceTime, external, and iPhone cameras (with Continuity Camera feature)
- Open image files from Finder
- VOCR Menu: Command+Shift+Control+S
- Real-time OCR: Command+Shift+Control+R
- Object detection for icons
- Auto Scan
- Customize shortcuts
- Auto updates
- Save last image
- Save OCR results
- Faster VOCursor scanning
- Target window feature
- Disable mouse movement
- Launch on login
- Startup sound
- Logger
VOCR v2.0.0-beta.3
Changelog
- Camera Capture: command+control+shift+c
- Settings > Choose camera to select an external camera
- gpt-4o for 50% cheaper and faster response.
- A shortcut cannot be deleted.
- New updates submenu
- Able to toggle automatically check for updates and automatically install updates from the menu.
- Pre-release channel
- Kill other running instances VOCR.
- Store API key in Keychain
- Quit VOCR
- Delete ~/Library/Preferences/com.chikim.VOCR.plist permanently with Command+Option+Delete
- Reboot.
- Added permission for notification center
- Fixed when menu is not working after closing a window
- Logger creates file when the file is deleted.
- Check for update when launching
- Increased timeout for request to 10 minutes
- Play sound when VOCR is launched and ready.
- Alert update through notification center
- Fixed error when encountering Ollama model with no families.
- Realtime OCR shortcut toggles the feature.
- Autoupdater
- Implemented logger
- Ask which model for Ollama to use if multiple clip models are found.
- You can also select a model for Ollama by just click Ollama in the model menu.
- Ask for a prompt after taking a screenshot.
- New prompt for explore
- Explore no longer generates images meant for debugging.
- Presents the same menu when launched by shortcut or clicking statusbar.
- Reports more errors when request fails.
- Cancels previous request when making new request
- Ollama support
- Use original screenshot resolution instead of window resolution point except explore mode.
- New Workflow: Use Command+Control+Shift+W/V to set the target to a window/VOCursor and perform the OCR scan. After that, the features such as real-time OCR, explore, and ask will use the target.
- Reset shortcut if there are different features after an update
- Bug fix: global shortcuts sometimes not active
- Customize shortcuts
- Token usage at the end of description
- Support system prompt for GPT
- Setting to toggle use last prompt without asking
- Save last screenshot
- Dismiss menu with command+Z instead of esc if realtime or navigation is active.
- You can just press return to ask GPT without editing.
- Changed diff algorithm for less verbose realtime OCR.
- Realtime OCR remains active at its initial location, allowing you to move the VOCursor during the process. To perform realtime OCR in a different location, stop the OCR, move the VOCursor, then restart realtime OCR.
- Realtime OCR of VOCursor: Command+Control+Shift+r
- Able to toggle obbject detection from the setings.
- OCR Window: Command+Control+Shift+w
- OCR VOCursor: Command+Control+Shift+v
- Ask GPT about VOCursor: Command+Control+Shift+a
- Settings: Command+Control+Shift+S
- Faster screenshot of VOCursor
- Open an image file in VOCR from finder to ask GPT
- Gpt response gets copied to the clipboard, so you can paste somewhere if you miss it.
- Object Detection through rectangles: Any boxes without text such as icons.
- Moved save OCR result to the menu.
- Moved target window to settings menu.
- auto Scan: Thanks @vick08
- Readme Improvement: Thanks @ssawczyn
The GPT features utilize GPT-4V, and they require your own OpenAI API key.
The usage cost from VOCR is an estimate. For the official usage and cost, please refer to the Usage Dashboard on OpenAI website. Also you can create an monthly limit and alert on the website as well.
Explore feature only works with GPT, and location information from the model is extremely unreliable and inaccurate.
Instruction for Ollama
- Download Ollama and install.
- Open terminal, and type "ollama pull llava" without the quotes.
- Wait for Ollama to finish downloading the model.
- Quit terminal
- Go to VOCR menu > Settings > Models and select Ollama
Experimental
These features may not make into the public release.
- Identify object when navigation is active: Command+Control+I
- Explore window with GPT: Command+Control+Shift+e
- an option to switch to using a local model such as Llava using llama.cpp instead of GPT.
Warning: It's very complex to set your own Llama.cpp server.
VOCR v2.0.0-beta.2
Changelog
- New updates submenu
- Able to toggle automatically check for updates and automatically install updates from the menu.
- Pre-release channel
- Kill other running instances VOCR.
- Store API key in Keychain
- Quit VOCR
- Delete ~/Library/Preferences/com.chikim.VOCR.plist permanently with Command+Option+Delete
- Reboot.
- Added permission for notification center
- Fixed when menu is not working after closing a window
- Logger creates file when the file is deleted.
- Check for update when launching
- Increased timeout for request to 10 minutes
- Play sound when VOCR is launched and ready.
- Alert update through notification center
- Fixed error when encountering Ollama model with no families.
- Realtime OCR shortcut toggles the feature.
- Autoupdater
- Implemented logger
- Ask which model for Ollama to use if multiple clip models are found.
- You can also select a model for Ollama by just click Ollama in the model menu.
- Ask for a prompt after taking a screenshot.
- New prompt for explore
- Explore no longer generates images meant for debugging.
- Presents the same menu when launched by shortcut or clicking statusbar.
- Reports more errors when request fails.
- Cancels previous request when making new request
- Ollama support
- Use original screenshot resolution instead of window resolution point except explore mode.
- New Workflow: Use Command+Control+Shift+W/V to set the target to a window/VOCursor and perform the OCR scan. After that, the features such as real-time OCR, explore, and ask will use the target.
- Reset shortcut if there are different features after an update
- Bug fix: global shortcuts sometimes not active
- Customize shortcuts
- Token usage at the end of description
- Support system prompt for GPT
- Setting to toggle use last prompt without asking
- Save last screenshot
- Dismiss menu with command+Z instead of esc if realtime or navigation is active.
- You can just press return to ask GPT without editing.
- Changed diff algorithm for less verbose realtime OCR.
- Realtime OCR remains active at its initial location, allowing you to move the VOCursor during the process. To perform realtime OCR in a different location, stop the OCR, move the VOCursor, then restart realtime OCR.
- Realtime OCR of VOCursor: Command+Control+Shift+r
- Able to toggle obbject detection from the setings.
- OCR Window: Command+Control+Shift+w
- OCR VOCursor: Command+Control+Shift+v
- Ask GPT about VOCursor: Command+Control+Shift+a
- Settings: Command+Control+Shift+S
- Faster screenshot of VOCursor
- Open an image file in VOCR from finder to ask GPT
- Gpt response gets copied to the clipboard, so you can paste somewhere if you miss it.
- Object Detection through rectangles: Any boxes without text such as icons.
- Moved save OCR result to the menu.
- Moved target window to settings menu.
- auto Scan: Thanks @vick08
- Readme Improvement: Thanks @ssawczyn
The GPT features utilize GPT-4V, and they require your own OpenAI API key.
The usage cost from VOCR is an estimate. For the official usage and cost, please refer to the Usage Dashboard on OpenAI website. Also you can create an monthly limit and alert on the website as well.
Explore feature only works with GPT, and location information from the model is extremely unreliable and inaccurate.
Instruction for Ollama
- Download Ollama and install.
- Open terminal, and type "ollama run llava" without the quotes.
- Wait until you get the prompt >>> send a message
- Then type /bye and press return
- Quit terminal
- Go to VOCR menu > Settings > Models and select Ollama
Experimental
These features may not make into the public release.
- Identify object when navigation is active: Command+Control+I
- Explore window with GPT: Command+Control+Shift+e
- an option to switch to using a local model such as Llava using llama.cpp instead of GPT.
Warning: It's very complex to set your own Llama.cpp server.
Download
VOCR v2.0.0-beta.1
Changelog
- Pre-release channel
- Kill other running instances VOCR.
- Store API key in Keychain
- Quit VOCR
- Delete ~/Library/Preferences/com.chikim.VOCR.plist permanently with Command+Option+Delete
- Reboot.
- Added permission for notification center
- Fixed when menu is not working after closing a window
- Logger creates file when the file is deleted.
- Check for update when launching
- Increased timeout for request to 10 minutes
- Play sound when VOCR is launched and ready.
- Alert update through notification center
- Fixed error when encountering Ollama model with no families.
- Realtime OCR shortcut toggles the feature.
- Autoupdater
- Implemented logger
- Ask which model for Ollama to use if multiple clip models are found.
- You can also select a model for Ollama by just click Ollama in the model menu.
- Ask for a prompt after taking a screenshot.
- New prompt for explore
- Explore no longer generates images meant for debugging.
- Presents the same menu when launched by shortcut or clicking statusbar.
- Reports more errors when request fails.
- Cancels previous request when making new request
- Ollama support
- Use original screenshot resolution instead of window resolution point except explore mode.
- New Workflow: Use Command+Control+Shift+W/V to set the target to a window/VOCursor and perform the OCR scan. After that, the features such as real-time OCR, explore, and ask will use the target.
- Reset shortcut if there are different features after an update
- Bug fix: global shortcuts sometimes not active
- Customize shortcuts
- Token usage at the end of description
- Support system prompt for GPT
- Setting to toggle use last prompt without asking
- Save last screenshot
- Dismiss menu with command+Z instead of esc if realtime or navigation is active.
- You can just press return to ask GPT without editing.
- Changed diff algorithm for less verbose realtime OCR.
- Realtime OCR remains active at its initial location, allowing you to move the VOCursor during the process. To perform realtime OCR in a different location, stop the OCR, move the VOCursor, then restart realtime OCR.
- Realtime OCR of VOCursor: Command+Control+Shift+r
- Able to toggle obbject detection from the setings.
- OCR Window: Command+Control+Shift+w
- OCR VOCursor: Command+Control+Shift+v
- Ask GPT about VOCursor: Command+Control+Shift+a
- Settings: Command+Control+Shift+S
- Faster screenshot of VOCursor
- Open an image file in VOCR from finder to ask GPT
- Gpt response gets copied to the clipboard, so you can paste somewhere if you miss it.
- Object Detection through rectangles: Any boxes without text such as icons.
- Moved save OCR result to the menu.
- Moved target window to settings menu.
- auto Scan: Thanks @vick08
- Readme Improvement: Thanks @ssawczyn
The GPT features utilize GPT-4V, and they require your own OpenAI API key.
The usage cost from VOCR is an estimate. For the official usage and cost, please refer to the Usage Dashboard on OpenAI website. Also you can create an monthly limit and alert on the website as well.
Explore feature only works with GPT, and location information from the model is extremely unreliable and inaccurate.
Instruction for Ollama
- Download Ollama and install.
- Open terminal, and type "ollama run llava" without the quotes.
- Wait until you get the prompt >>> send a message
- Then type /bye and press return
- Quit terminal
- Go to VOCR menu > Settings > Models and select Ollama
Experimental
These features may not make into the public release.
- Identify object when navigation is active: Command+Control+I
- Explore window with GPT: Command+Control+Shift+e
- an option to switch to using a local model such as Llava using llama.cpp instead of GPT.
Warning: It's very complex to set your own Llama.cpp server.
VOCR v1.0.0-beta.2
Fixed crashing when there's no window. Thanks @vick08
VOCR v1.0.0-beta.1
HIGHLY EXPERIMENTAL: USE AT YOUR OWN RISK!
You can now choose sound output for positional audio feedback.
VOCR v1.0.0-alpha.3
HIGHLY EXPERIMENTAL: USE AT YOUR OWN RISK
VOCR v1.0.0-alpha.2
HIGHLY EXPERIMENTAL: USE AT YOUR OWN RISK
Added import feature