Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 2 example error #66

Open
jhancock1229 opened this issue Oct 31, 2022 · 1 comment
Open

Chapter 2 example error #66

jhancock1229 opened this issue Oct 31, 2022 · 1 comment

Comments

@jhancock1229
Copy link

In attempting to execute the code at the end of chapter 2 i get the following error:

WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 1 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 2 of 3. Reason: timed out
WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable on attempt 3 of 3. Reason: timed out
WARNING:google.auth._default:Authentication failed using Compute Engine authentication due to unavailable metadata server.
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Connecting anonymously.

I know its in reference to attempting to pull kinglear.txt from google storage. Any tips on how to resolve this? BTW here is the source code i copied out of the book:

import re
import apache_beam as beam
from apache_beam.io import ReadFromText
from apache_beam.io import WriteToText
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.options.pipeline_options import SetupOptions

input_file = "gs://dataflow-samples/shakespeare/kinglear.txt"
output_file = "~/coding/machine-learning/output.txt"

pipeline_options = PipelineOptions()

with beam.Pipeline(options=pipeline_options) as p:
    lines = p | ReadFromText(input_file)
    counts = (
        lines
        | 'Split' >> beam.FlatMap(lambda x: re.findall(r'[A-Za-z\']+', x))
        | 'PairWithOne' >> beam.Map(lambda x: (x, 1))
        | 'GroupAndSum' >> beam.CombinePerKey(sum)
    )
    def format_result(word_count):
        (word, count) = word_count
        return "{}: {}".format(word, count)
    
    output = counts | 'Format' >> beam.Map(format_result)

    output | WriteToText(output_file)
@MicAnt64
Copy link

MicAnt64 commented Dec 3, 2022

Hi jhancock1229,

Those are not errors, they are warnings. I get the same warnings. If I enter the following in a new cell:
!head output.txt*

I get:
KING: 243 LEAR: 236 DRAMATIS: 1 PERSONAE: 1 king: 65 of: 447 Britain: 2 OF: 15 FRANCE: 10 DUKE: 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants