-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance of crawl4AI in Docker compared to pip installation outside Docker environment #329
Comments
@QuangTQV Can you share with me the specs; is it AMD or ARM, and also how much memory you assign to your Docker? Do you know that, and on which hardware are you running it? I'm curious to know it. |
I'm mistaken, sorry. { |
@QuangTQV The code below is how to use the new version: import asyncio
from crawl4ai import AsyncWebCrawler, BrowserConfig, CrawlerRunConfig, CacheMode
from crawl4ai.content_filter_strategy import PruningContentFilter
from crawl4ai.markdown_generation_strategy import DefaultMarkdownGenerator
async def main():
# Configure the browser settings
browser_config = BrowserConfig(
headless=True,
verbose=True,
user_agent_mode="random",
)
# Set run configurations, including cache mode and markdown generator
crawl_config = CrawlerRunConfig(
cache_mode=CacheMode.BYPASS,
markdown_generator=DefaultMarkdownGenerator(
# content_filter=PruningContentFilter(threshold=0.48, threshold_type="fixed", min_word_threshold=0),
# options={"ignore_links": True}
)
)
async with AsyncWebCrawler(browser_config=browser_config) as crawler:
result = await crawler.arun(
url='https://www.kidocode.com/degrees/technology',
config=crawl_config
)
if result.success:
print("Raw Markdown Length:", len(result.markdown_v2.raw_markdown))
print("Citations Markdown Length:", len(result.markdown_v2.markdown_with_citations))
# Fit markdown exists if you pass content filter
# print("Fit Markdown Length:", len(result.markdown_v2.fit_markdown))
if __name__ == "__main__":
asyncio.run(main()) |
I am encountering slow performance when using crawl4AI in a Docker environment, whereas when I test it outside of Docker using the regular pip installation, the speed is significantly faster. Could there be any configuration or environment issues causing this discrepancy in performance? Please let me know if there are any errors or optimizations I may have overlooked.
The text was updated successfully, but these errors were encountered: