Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd_utils.initLogging: also add handler to root logger #1288

Merged
merged 8 commits into from
Nov 18, 2024
Merged

Conversation

bertsky
Copy link
Collaborator

@bertsky bertsky commented Oct 9, 2024

As a follow-up to ccb416b, but more general:

initLogging should always affect the root logger '', too. (Not just when disabling logging.)

Reasons:

  • to be consistent with file config (which by default also configures the unqualified root logger for our default handlers
  • to prevent imported libraries from initializing logging on their own (like import tensorflow) – the place to check whether logging has been set up yet is always getLogger('').hasHandlers(), also in Python's logging.basicConfig BTW

I know we deliberately changed this to exclude the root logger earlier. But I cannot find the reasoning behind that anymore. Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging. (Perhaps at that time we got distracted by the fact that we still did call initLogging unconditionally when importing ocrd_utils.logging.)

BTW, IMO most places (core and modules code) can get their loggers from Python stdlib's logging.getLogger instead of our ocrd_utils.logging.getLogger, because we don't add anything to that nowadays.

…tent with file config and prevent imported libraries from initing logging first), but disable propagation for ocrd loggers (to avoid duplication)
@bertsky bertsky requested a review from kba October 9, 2024 16:53
@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 10, 2024

Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging.

But if initLogging is not invoked then the logging is (maybe it's a was now, not sure) not working at all when core code is imported in another project say Operandi. Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).

This said, the ocrd_network logging acts weirdly and I am not able to find out how to fix it. I did some experiments and observations that I am going to share in next Monday's meeting.

@kba
Copy link
Member

kba commented Oct 10, 2024

I know we deliberately changed this to exclude the root logger earlier. But I cannot find the reasoning behind that anymore. Maybe "we are a library, not an application"? Well, I agree actually, but then just don't call initLogging. (Perhaps at that time we got distracted by the fact that we still call initLogging unconditionally when importing ocrd_utils.logging.)

Yes, the idea was to behave unintrusively as a library but as you point out, in practice we don't

BTW, IMO most places (core and modules code) can get their loggers from Python stdlib's logging.getLogger instead of our ocrd_utils.logging.getLogger, because we don't add anything to that nowadays.

Agreed.

This said, the ocrd_network logging acts weirdly and I am not able to find out how to fix it. I did some experiments and observations that I am going to share in next Monday's meeting.

🙏

Let's discuss on Monday, I'm all for pushing for a clean logging because that is high on the list of most painful aspects of the OCR-D software at the moment.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

@MehmedGIT

But if initLogging is not invoked then the logging is (maybe it's a was now, not sure) not working at all when core code is imported in another project say Operandi.

What do you mean not working at all? Operandi can have its own logging.basicConfig or logging.config.fileConfig or whatever. All OCR-D components simply provide their messages on their respective loggers, the application decides what to do with them (creating handlers and setting levels).

If the application is a standalone OCR-D CLI (→ocrd_utils.initLogging) though, then no other components (think Tensorflow) should interfere with logging setup (i.e. handlers+levels). Hence this PR.

Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).

Sry, I don't understand that at all.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

@kba

(Perhaps at that time we got distracted by the fact that we still call initLogging unconditionally when importing ocrd_utils.logging.)

Yes, the idea was to behave unintrusively as a library but as you point out, in practice we don't

Actually, I was trying to point out that we do.

We do not call initLogging at module-import time anymore. Which is correct IMO. But then we can add a handler to the root logger when we do call it.

@MehmedGIT
Copy link
Contributor

What do you mean not working at all?

Not logging anything, and if initLogging is not called, the initialization phase of the Operandi Server just freezes.

Operandi can have its own logging.basicConfig or logging.config.fileConfig or whatever.

Yes, but there is some kind of interference because of the ocrd_logging.conf coming from the core. I have to manually provide ocrd_logging.conf to avoid some side effects but still not completely.

I am talking about the experience I had in the past. Maybe these issues are no longer there. I am not sure, because I have my hack to reconfigure the loggers at run-time after the initLogging call to prevent any undesired side effects.

Sry, I don't understand that at all.

I was talking about the ocrd_logging.conf file inside ocrd_utils.

@bertsky
Copy link
Collaborator Author

bertsky commented Oct 10, 2024

Not logging anything, and if initLogging is not called, the initialization phase of the Operandi Server just freezes.

I will try to unwrap: Operandi on startup calls ocrd_utils.initLogging. So you are saying that

  1. without ocrd_utils.initLogging, nothing gets logged at all.
    I would assume that Python will automatically do a logging.basicConfig – unless perhaps some rogue library you are using already inits logging (like our Tensorflow example). So you should normally at least get warnings. Have you tried instrumenting to see whether getLogger('').handlers shows anything?
  2. without ocrd_utils.initLogging, the server (i.e. probably on_startup_event) blocks.
    That's odd. What does the stack trace show in that state?

Also, have you tried calling logging.basicConfig or logging.config.fileConfig instead of the ocrd_utils version?

Yes, but there is some kind of interference because of the ocrd_logging.conf coming from the core. I have to manually provide ocrd_logging.conf to avoid some side effects but still not completely.

I don't understand how our package data log config can interfere at all, unless ocrd_utils.initLogging is called.

It sounds like what you describe has been observed when core would still do initLogging at import time (which was clearly an antipattern and it does not anymore).

I am talking about the experience I had in the past. Maybe these issues are no longer there.

I would hope so. Could you please try it out – perhaps even with and without this change here?

I am not sure, because I have my hack to reconfigure the loggers at run-time after the initLogging call to prevent any undesired side effects.

Ok, that's quite involved (I don't fully understand the InterceptHandler).

Side effects being handlers and levels you don't want? (So it could also be the same issue as above?)

Ideally, the behavior of the root logger, ocrd and ocrd_network should not be affected by the default logging config file if the user does not do it specifically with a manual logging config file. Otherwise, it bugs programmatically set file handlers and log levels (in ocrd_network at least).
(I was talking about the ocrd_logging.conf file inside ocrd_utils.)

So you were merely saying that an optional file config should not interfere with programmatic setup?

I agree – and that's what initLogging in its current form seems to achieve. (Just that the programmatic setup is missing the root logger, hence this PR.)

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Oct 10, 2024

I don't understand how our package data log config can interfere at all, unless ocrd_utils.initLogging is called.

Because it has to be called when using core methods, and there is no workaround to avoid it (may no longer be valid).

It sounds like what you describe has been observed when core would still do initLogging at import time (which was clearly an antipattern and it does not anymore).

That may be the case, yes. I will have to verify again.

Side effects being handlers and levels you don't want? (So it could also be the same issue as above?)

Yes, absolutely. Take for instance just core (the latest version) - the ocrd_network.ProcessingWorker. From observations - the log level in the constructor seems to be ERROR or WARNING (not INFO) since I can only see messages with that log level forwarded to the logging file of the specific worker. However, the log level of ocrd_network.ProcessingWorker.publish_result_to_all method is INFO. Because these are normally forwarded to the log file. So when the default ocrd_logging.conf is used, although it sets ocrd_network logger level to INFO, it is not INFO everywhere I would expect it to be. The worst part is that I am not sure why.

The only place where everything is logged as expected to a file is the ocrd_network.process_helpers.invoke_processor() because of a new context manager, check here.

So you were merely saying that an optional file config should not interfere with programmatic setup?

Yes. It is not that optional in the end since core provides a default for you and the only way to avoid that is to provide a manual logging configuration file.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 7, 2024

@kba @MehmedGIT in the last 6 commits, I rewrote the entire thing. I also made sure we have no side effects between tests anymore (which haunted me for quite a while).

It makes much more sense to me this way – and it works under all circumstances. Please let me know what you think. We can also defer this to v3 if deemed too radical.

@MehmedGIT
Copy link
Contributor

Seems some adaptations are needed inside ocrd_network module as well. Here specifically. The processing job logs have many exceptions such as shown below. Just removing the initLogging call does not help. Any ideas?

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1089, in emit
    self.flush()
  File "/usr/local/lib/python3.8/logging/__init__.py", line 1069, in flush
    self.stream.flush()
OSError: [Errno 5] Input/output error
Call stack:
  File "/home/mm/venv38-all/bin/ocrd-cis-ocropy-denoise", line 8, in <module>
    sys.exit(ocrd_cis_ocropy_denoise())
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/ocrd_cis/ocropy/cli.py", line 28, in ocrd_cis_ocropy_denoise
    return ocrd_cli_wrap_processor(OcropyDenoise, *args, **kwargs)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd/decorators/__init__.py", line 71, in ocrd_cli_wrap_processor
    check_and_run_network_agent(processorClass, subcommand, address, database, queue)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd/decorators/__init__.py", line 196, in check_and_run_network_agent
    processing_worker.start_consuming()
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd_network/processing_worker.py", line 134, in start_consuming
    self.rmq_consumer.start_consuming()
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd_network/rabbitmq_utils/consumer.py", line 45, in start_consuming
    self._channel.start_consuming()
  File "/home/mm/venv38-all/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 1883, in start_consuming
    self._process_data_events(time_limit=None)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 2044, in _process_data_events
    self.connection.process_data_events(time_limit=time_limit)
  File "/home/mm/venv38-all/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 851, in process_data_events
    self._dispatch_channel_events()
  File "/home/mm/venv38-all/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 567, in _dispatch_channel_events
    impl_channel._get_cookie()._dispatch_events()
  File "/home/mm/venv38-all/lib/python3.8/site-packages/pika/adapters/blocking_connection.py", line 1510, in _dispatch_events
    consumer_info.on_message_callback(self, evt.method,
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd_network/processing_worker.py", line 110, in on_consumed_message
    self.process_message(processing_message=processing_message)
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd_network/processing_worker.py", line 187, in process_message
    invoke_processor(
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd_network/process_helpers.py", line 35, in invoke_processor
    run_processor(
  File "/home/mm/repos/core/build/__editable__.ocrd-2.69.0-py3-none-any/ocrd/processor/helpers.py", line 84, in run_processor
    log.debug("Running processor %s", processorClass)
Message: 'Running processor %s'
Arguments: (<class 'ocrd_cis.ocropy.denoise.OcropyDenoise'>,)

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 11, 2024

@MehmedGIT

Seems some adaptations are needed inside ocrd_network module as well. Here specifically. The processing job logs have many exceptions such as shown below. Just removing the initLogging call does not help. Any ideas?

I may have broke force_reinit (as there is no cleanup during initLogging anymore). Perhaps I should call disableLogging if both flags (initialized_flag and force_reinit) are true...

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 11, 2024

Can you try with 192895a again?

@MehmedGIT
Copy link
Contributor

Can you try with 192895a again?

Works fine now. I can also see much more logging with DEBUG level on the processing jobs than before. The mets server's log file is not logging after the initialization though.

14:44:51.694 INFO ocrd.models.ocrd_mets.server./tmp/ocrd_network_sockets/_home_mm_repos_ocrd_network_tests_ws29_data.sock - Starting up METS server
14:44:51.704 INFO uvicorn.error - Started server process [100794]
14:44:51.704 INFO uvicorn.error - Waiting for application startup.
14:44:51.704 INFO uvicorn.error - Application startup complete.
14:44:51.704 INFO uvicorn.error - Uvicorn running on unix socket /tmp/ocrd_network_sockets/_home_mm_repos_ocrd_network_tests_ws29_data.sock (Press CTRL+C to quit)

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 11, 2024

The mets server's log file is not logging after the initialization though.

What do you expect to see? Our logger is not used for anything beyond DEBUG level. And we do not set log_config of Uvicorn (and even disable its access_log), so I would expect nothing.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 11, 2024

See this recipe for how we could include more messages from Uvicorn.

@MehmedGIT
Copy link
Contributor

The mets server's log file is not logging after the initialization though.

What do you expect to see? Our logger is not used for anything beyond DEBUG level. And we do not set log_config of Uvicorn (and even disable its access_log), so I would expect nothing.

Ah, okay. I am expecting to see the INFO level logs of the mets_server.

@bertsky
Copy link
Collaborator Author

bertsky commented Nov 11, 2024

I am expecting to see the INFO level logs of the mets_server.

Are you perhaps still using a METS Server installation before your extensive logging change? (That would explain why you get Starting up instead of Configuring...)


#
# logger tensorflow
#
[logger_ocrd_tensorflow]
level=ERROR
handlers=consoleHandler
handlers=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the semantics of setting handlers to an empty value? It's not ovious from https://docs.python.org/3/library/logging.config.html for me - I assume it's incremental, i.e. does not explicitly add a handler but does not remove existing handlers either?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how fileConfig gets run. The default is disable_existing_loggers=True, i.e. before parsing, all loggers (and their handlers) are removed, as done in our disableLogging. See documentation.

Still, what was surprising to me is that one must add a handlers statement, it cannot be omitted – even if one just wants the messages to get forwarded up in the hierarchy without extra handlers.

The config file is documented here BTW.

@kba kba merged commit 67263f8 into master Nov 18, 2024
22 checks passed
@kba kba deleted the initlogging-root branch November 20, 2024 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

processing with --log-level override does not apply anymore
3 participants