Skip to content

Commit

Permalink
[AI Chat] Introduces support for host-specific distiller scripts (#25722
Browse files Browse the repository at this point in the history
)

This change introduces host-specific site distiller scripts, improving how content is parsed and presented for AI-driven text distillation. We’ve refined host retrieval from page origins, improved parsing and decoding logic, and addressed unsafe buffer issues. Distiller resources have been reorganized into a dedicated directory, while unnecessary complexity and reliance on regex-based lookups have been removed.

We’ve switched from creating XML documents to HTML documents for more relaxed parsing rules, simplified the distillation process by removing events, and adopted absl::StrFormat for cleaner string handling. Namespace isolation and resource visibility controls have been tightened, and a disabled-by-default feature flag is now available for incremental rollout.

Additional improvements include better handling of profile pages on X.com (formerly Twitter), clearer metadata prefaces to help LLMs understand where the page content begins, early returns on failed script lookups, and consistency in naming and formatting. Debugging logs and other extraneous elements have been removed to streamline the final implementation.
  • Loading branch information
jonathansampson authored Dec 11, 2024
1 parent 8aeb9e5 commit 97fccd9
Show file tree
Hide file tree
Showing 27 changed files with 1,384 additions and 46 deletions.
97 changes: 52 additions & 45 deletions browser/about_flags.cc
Original file line number Diff line number Diff line change
Expand Up @@ -356,51 +356,58 @@
#define BRAVE_MIDDLE_CLICK_AUTOSCROLL_FEATURE_ENTRY
#endif

#define BRAVE_AI_CHAT_FEATURE_ENTRIES \
EXPAND_FEATURE_ENTRIES( \
{ \
"brave-ai-chat", \
"Brave AI Chat", \
"Summarize articles and engage in conversation with AI", \
kOsWin | kOsMac | kOsLinux | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kAIChat), \
}, \
{ \
"brave-ai-chat-history", \
"Brave AI Chat History", \
"Enables AI Chat History persistence and management", \
kOsWin | kOsMac | kOsLinux, \
FEATURE_VALUE_TYPE(ai_chat::features::kAIChatHistory), \
}, \
{ \
"brave-ai-chat-context-menu-rewrite-in-place", \
"Brave AI Chat Rewrite In Place From Context Menu", \
"Enables AI Chat rewrite in place feature from the context menu", \
kOsDesktop, \
FEATURE_VALUE_TYPE(ai_chat::features::kContextMenuRewriteInPlace), \
}, \
{ \
"brave-ai-chat-page-content-refine", \
"Brave AI Chat Page Content Refine", \
"Enable local text embedding for long page content in order to " \
"find " \
"most relevant parts to the prompt within context limit.", \
kOsDesktop | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kPageContentRefine), \
}, \
{ \
"brave-ai-chat-allow-private-ips", \
"Private IP Addresses for Custom Model Endpoints", \
"Permits the use of private IP addresses as model endpoint URLs", \
kOsWin | kOsMac | kOsLinux | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kAllowPrivateIPs), \
}, \
{ \
"brave-ai-chat-open-leo-from-brave-search", \
"Open Leo AI Chat from Brave Search", \
"Enables opening Leo AI Chat from Brave Search", \
kOsDesktop | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kOpenAIChatFromBraveSearch), \
#define BRAVE_AI_CHAT_FEATURE_ENTRIES \
EXPAND_FEATURE_ENTRIES( \
{ \
"brave-ai-chat", \
"Brave AI Chat", \
"Summarize articles and engage in conversation with AI", \
kOsWin | kOsMac | kOsLinux | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kAIChat), \
}, \
{ \
"brave-ai-chat-history", \
"Brave AI Chat History", \
"Enables AI Chat History persistence and management", \
kOsWin | kOsMac | kOsLinux, \
FEATURE_VALUE_TYPE(ai_chat::features::kAIChatHistory), \
}, \
{ \
"brave-ai-host-specific-distillation", \
"Brave AI Host-Specific Distillation", \
"Enables support for host-specific distillation scripts", \
kOsWin | kOsMac | kOsLinux, \
FEATURE_VALUE_TYPE(ai_chat::features::kCustomSiteDistillerScripts), \
}, \
{ \
"brave-ai-chat-context-menu-rewrite-in-place", \
"Brave AI Chat Rewrite In Place From Context Menu", \
"Enables AI Chat rewrite in place feature from the context menu", \
kOsDesktop, \
FEATURE_VALUE_TYPE(ai_chat::features::kContextMenuRewriteInPlace), \
}, \
{ \
"brave-ai-chat-page-content-refine", \
"Brave AI Chat Page Content Refine", \
"Enable local text embedding for long page content in order to " \
"find " \
"most relevant parts to the prompt within context limit.", \
kOsDesktop | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kPageContentRefine), \
}, \
{ \
"brave-ai-chat-allow-private-ips", \
"Private IP Addresses for Custom Model Endpoints", \
"Permits the use of private IP addresses as model endpoint URLs", \
kOsWin | kOsMac | kOsLinux | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kAllowPrivateIPs), \
}, \
{ \
"brave-ai-chat-open-leo-from-brave-search", \
"Open Leo AI Chat from Brave Search", \
"Enables opening Leo AI Chat from Brave Search", \
kOsDesktop | kOsAndroid, \
FEATURE_VALUE_TYPE(ai_chat::features::kOpenAIChatFromBraveSearch), \
})

#if BUILDFLAG(ENABLE_AI_REWRITER)
Expand Down
1 change: 1 addition & 0 deletions browser/ui/BUILD.gn
Original file line number Diff line number Diff line change
Expand Up @@ -769,6 +769,7 @@ source_set("ui") {
"//brave/components/ai_chat/core/browser",
"//brave/components/ai_chat/core/common",
"//brave/components/ai_chat/core/common/mojom",
"//brave/components/ai_chat/resources/custom_site_distiller_scripts:generated_resources",
"//brave/components/ai_chat/resources/page:generated_resources",
"//brave/components/ai_rewriter/common/buildflags",
"//brave/components/brave_adaptive_captcha",
Expand Down
8 changes: 8 additions & 0 deletions components/ai_chat/core/common/features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@ bool IsAIChatHistoryEnabled() {
return base::FeatureList::IsEnabled(features::kAIChatHistory);
}

BASE_FEATURE(kCustomSiteDistillerScripts,
"CustomSiteDistillerScripts",
base::FEATURE_DISABLED_BY_DEFAULT);

bool IsCustomSiteDistillerScriptsEnabled() {
return base::FeatureList::IsEnabled(features::kCustomSiteDistillerScripts);
}

BASE_FEATURE(kContextMenuRewriteInPlace,
"AIChatContextMenuRewriteInPlace",
base::FEATURE_ENABLED_BY_DEFAULT);
Expand Down
4 changes: 4 additions & 0 deletions components/ai_chat/core/common/features.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ COMPONENT_EXPORT(AI_CHAT_COMMON) BASE_DECLARE_FEATURE(kAIChatHistory);

COMPONENT_EXPORT(AI_CHAT_COMMON) bool IsAIChatHistoryEnabled();

COMPONENT_EXPORT(AI_CHAT_COMMON)
BASE_DECLARE_FEATURE(kCustomSiteDistillerScripts);
COMPONENT_EXPORT(AI_CHAT_COMMON) bool IsCustomSiteDistillerScriptsEnabled();

COMPONENT_EXPORT(AI_CHAT_COMMON)
BASE_DECLARE_FEATURE(kContextMenuRewriteInPlace);
COMPONENT_EXPORT(AI_CHAT_COMMON) bool IsContextMenuRewriteInPlaceEnabled();
Expand Down
1 change: 1 addition & 0 deletions components/ai_chat/renderer/BUILD.gn
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ static_library("renderer") {
"//base",
"//brave/components/ai_chat/core/common",
"//brave/components/ai_chat/core/common/mojom",
"//brave/components/ai_chat/resources/custom_site_distiller_scripts:generated_resources",
"//content/public/renderer",
"//gin",
"//mojo/public/cpp/bindings",
Expand Down
2 changes: 1 addition & 1 deletion components/ai_chat/renderer/page_content_extractor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ void PageContentExtractor::ExtractPageContent(
VLOG(1) << "Text transcript type";
// Do text extraction
DistillPageText(
render_frame(), isolated_world_id_,
render_frame(), global_world_id_, isolated_world_id_,
base::BindOnce(&PageContentExtractor::OnDistillResult,
weak_ptr_factory_.GetWeakPtr(), std::move(callback)));
}
Expand Down
95 changes: 95 additions & 0 deletions components/ai_chat/renderer/page_text_distilling.cc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
#include "brave/components/ai_chat/renderer/page_text_distilling.h"

#include <iterator>
#include <map>
#include <memory>
#include <optional>
#include <queue>
Expand All @@ -17,6 +18,7 @@

#include "base/compiler_specific.h"
#include "base/containers/contains.h"
#include "base/containers/fixed_flat_map.h"
#include "base/containers/span.h"
#include "base/functional/bind.h"
#include "base/functional/callback.h"
Expand All @@ -27,7 +29,10 @@
#include "base/strings/utf_string_conversions.h"
#include "base/time/time.h"
#include "base/values.h"
#include "brave/components/ai_chat/core/common/features.h"
#include "brave/components/ai_chat/resources/custom_site_distiller_scripts/grit/custom_site_distiller_scripts_generated.h"
#include "content/public/renderer/render_frame.h"
#include "net/base/registry_controlled_domains/registry_controlled_domain.h"
#include "third_party/blink/public/mojom/script/script_evaluation_params.mojom-shared.h"
#include "third_party/blink/public/platform/web_string.h"
#include "third_party/blink/public/web/web_document.h"
Expand All @@ -39,6 +44,7 @@
#include "ui/accessibility/ax_node_data.h"
#include "ui/accessibility/ax_tree.h"
#include "ui/accessibility/ax_tree_update.h"
#include "ui/base/resource/resource_bundle.h"

namespace ai_chat {

Expand Down Expand Up @@ -139,8 +145,24 @@ void AddTextNodesToVector(const ui::AXNode* node,

void DistillPageText(
content::RenderFrame* render_frame,
int32_t global_world_id,
int32_t isolated_world_id,
base::OnceCallback<void(const std::optional<std::string>&)> callback) {
if (ai_chat::features::IsCustomSiteDistillerScriptsEnabled()) {
std::string host =
render_frame->GetWebFrame()->GetSecurityOrigin().Host().Utf8();
std::optional<std::pair<std::string, bool>> site_script =
LoadSiteScriptForHost(host);

if (site_script.has_value()) {
int32_t world_id =
site_script->second ? global_world_id : isolated_world_id;
DistillPageTextViaSiteScript(render_frame, site_script->first, world_id,
std::move(callback));
return;
}
}

auto snapshotter = render_frame->CreateAXTreeSnapshotter(
ui::AXMode::kWebContents | ui::AXMode::kHTML | ui::AXMode::kScreenReader);
ui::AXTreeUpdate snapshot;
Expand Down Expand Up @@ -204,4 +226,77 @@ void DistillPageText(
std::move(callback).Run(contents_text);
}

void DistillPageTextViaSiteScript(
content::RenderFrame* render_frame,
std::string_view script_content,
int32_t world_id,
base::OnceCallback<void(const std::optional<std::string>&)> callback) {
CHECK(ai_chat::features::IsCustomSiteDistillerScriptsEnabled());
// TODO (jonathansampson): Wrap scripts at build/transpile-time instead
// This produces an injected script that resembles the following:
// (() => {
// function distillPrimaryColumn (level) { ... }
// function distill(level) {
// return distillPrimaryColumn(level);
// }
// return distill(3);
// })())
std::string script = absl::StrFormat(
R"((()=> {
%s
return distill(3);
})())",
script_content);

blink::WebScriptSource source =
blink::WebScriptSource(blink::WebString::FromUTF8(script));

auto on_script_executed =
[](base::OnceCallback<void(const std::optional<std::string>&)> callback,
std::optional<base::Value> value, base::TimeTicks start_time) {
if (value && value->is_string() && !value->GetString().empty()) {
std::move(callback).Run(value->GetString());
} else {
std::move(callback).Run({});
}
};

// Execute the combined script as a single source
render_frame->GetWebFrame()->RequestExecuteScript(
world_id, base::span_from_ref(source),
blink::mojom::UserActivationOption::kDoNotActivate,
blink::mojom::EvaluationTiming::kAsynchronous,
blink::mojom::LoadEventBlockingOption::kDoNotBlock,
base::BindOnce(on_script_executed, std::move(callback)),
blink::BackForwardCacheAware::kAllow,
blink::mojom::WantResultOption::kWantResult,
// Because we are using a promise to resolve the result, we will use the
// `kAwait` option to ensure the promise is resolved before the callback
// is invoked.
blink::mojom::PromiseResultOption::kAwait);
}

std::optional<std::pair<std::string, bool>> LoadSiteScriptForHost(
std::string_view host) {
static constexpr auto kHostToScriptResource =
base::MakeFixedFlatMap<std::string_view, std::pair<int, bool>>({
{"github.com",
{IDR_CUSTOM_SITE_DISTILLER_SCRIPTS_GITHUB_COM_BUNDLE_JS, false}},
{"x.com", {IDR_CUSTOM_SITE_DISTILLER_SCRIPTS_X_COM_BUNDLE_JS, true}},
});

auto it = kHostToScriptResource.find(
net::registry_controlled_domains::GetDomainAndRegistry(
host, net::registry_controlled_domains::INCLUDE_PRIVATE_REGISTRIES));

if (it == kHostToScriptResource.end()) {
return std::nullopt;
}

return std::make_optional(std::make_pair(
ui::ResourceBundle::GetSharedInstance().LoadDataResourceString(
it->second.first),
it->second.second));
}

} // namespace ai_chat
18 changes: 18 additions & 0 deletions components/ai_chat/renderer/page_text_distilling.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,20 +9,38 @@
#include <cstdint>
#include <optional>
#include <string>
#include <utility>

#include "base/functional/callback_forward.h"
#include "url/gurl.h"

namespace content {
class RenderFrame;
}

namespace ai_chat {

// Distills the text content of a page. If possible, it will use a custom site
// distiller script. Otherwise, it will fall back to a more general approach.
void DistillPageText(
content::RenderFrame* render_frame,
int32_t global_world_id,
int32_t isolated_world_id,
base::OnceCallback<void(const std::optional<std::string>&)>);

// Attempts to retrieve a a custom site distiller script for the given host.
// Returns a pair consisting of the script content, and a boolean indicating if
// it is intended for the main world or not
std::optional<std::pair<std::string, bool>> LoadSiteScriptForHost(
std::string_view host);

// Attempts to distill a page based on the retrieval of a host-specific script.
void DistillPageTextViaSiteScript(
content::RenderFrame* render_frame,
std::string_view script_content,
int32_t world_id,
base::OnceCallback<void(const std::optional<std::string>&)>);

} // namespace ai_chat

#endif // BRAVE_COMPONENTS_AI_CHAT_RENDERER_PAGE_TEXT_DISTILLING_H_
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Copyright (c) 2024 The Brave Authors. All rights reserved.
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this file,
# You can obtain one at https://mozilla.org/MPL/2.0/.

import("//brave/components/common/typescript.gni")
import("//tools/grit/repack.gni")

transpile_web_ui("custom_site_distiller_scripts") {
resource_name = "custom_site_distiller_scripts"
visibility = [ ":*" ]
entry_points = [
[
"x_com",
rebase_path("scripts/x.com/index.ts"),
],
[
"github_com",
rebase_path("scripts/github.com/index.ts"),
],
]
output_module = true
}

pack_web_resources("generated_resources") {
resource_name = "custom_site_distiller_scripts"
output_dir = "$root_gen_dir/brave/components/ai_chat/resources/custom_site_distiller_scripts"
deps = [ ":custom_site_distiller_scripts" ]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/* Copyright (c) 2024 The Brave Authors. All rights reserved.
* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this file,
* You can obtain one at https://mozilla.org/MPL/2.0/. */

export enum LEO_DISTILLATION_LEVEL {
LOW = 0,
MEDIUM = 1,
HIGH = 2,
FULL = 3
}
Loading

0 comments on commit 97fccd9

Please sign in to comment.