Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: App-Metrics on success or failure of an invocation #245

Open
about-code opened this issue May 19, 2018 · 14 comments
Open

Question: App-Metrics on success or failure of an invocation #245

about-code opened this issue May 19, 2018 · 14 comments

Comments

@about-code
Copy link

about-code commented May 19, 2018

I am currently about studying the microprofile APIs and I very much like most of its specifications and how approachable its documentation is (mostly). I have no hands-on experience so far but I am curious to find out how it could help solve some of the requirements I see on work. Something I couldn't find much about is how metrics would behave in case of exceptions in the execution of an intercepted method. From that I guess there is currently no way to distinct whether a counter metric should be incremented or not based on the result of an invocation.

This is the business-case I see:

Let's say I have standard CRUD REST-API for a customer relationship management system. There's some method createCustomer(Customer cust) which validates a customer and throws a validation exception if the given object contains invalid data.

Now I can imagine different metrics beyond just how often createCustomer() was invoked:

  • how often did createCustomer() method succeed (how often were customers actually created)?
  • how often did createCustomer() fail ...
    • in general
    • with a particular exception (e.g. knowing that it often fails with validation exceptions can provide a hint on missing documentation/help or potential threads from intentionally sending invalid data)

I see that there are various other ways how something similar could be achieved, for example obtaining such metrics within the API-layer rather than in the business core. Nevertheless, I thought I could ask what the opinions are on these use cases and whether it is possible to apply/not apply metrics based on the results of invocations.

@about-code about-code changed the title Question: App-Metrics and interceptor behavior on exceptions Question: App-Metrics on success or failure of an invocation May 19, 2018
@raymondlam
Copy link
Contributor

Currently, MicroProfile Metrics does not have a simple way to track how often a method fails with different exceptions. I can bring up this issue for the next MP Metrics call and see what the opinions are. It seems doable from the API side (with annotations) but it may not be very configurable.

At the moment, tracking exceptions requires additional code in the business logic.

Naive example,

@Inject
MetricRegistry metricRegistry;
...
try {
  createCustomer();
  metricRegistry.counter("createCustomer.success").inc();
} catch (Exception e) {
  metricRegistry.counter("createCustomer.fail." + e.getClass().getName()).inc();
  throw e;
}

@about-code
Copy link
Author

about-code commented May 24, 2018

@raymondlam Thank you for your answer and offer to bring the idea up to a MP Metrics call.

What I could think of is something like this: @Metrics(type=MyCustomMetric.class) where MyCustomMetric implements an interface

public interface CustomMetric {
    public void onBeforeEach(MetricRegistry mr);
    public void onAfterEach(MetricRegistry mr);
    public void onAfterSuccess(MetricRegistry mr);
    public void onAfterError(MetricRegistry mr, Exception e);
}

Then we could implement the imperative logic given in your example outside an annotated business method in the respective callback methods. An @Metrics interceptor impl. calling them could then look roughly like:

Object result = null;
Exception error = null;

// get an instance of a CustomMetric impl.
CustomMetric custometric = getCustomMetricFromAnnotationOf(method);
try {
   custometric.onBeforeEach(mr);
   result = method.proceed();
} catch (Exception e) {
   error = e;
   throw e;
} finally {
   if (error != null) {
      custometric.onAfterError(mr, error);
   } else {
     // alternatively: custometric.onAfterSuccess(mr, result);
      custometric.onAfterSuccess(mr); 
   }
   custometric.onAfterEach(mr);
}

Related Work

This draft is based on an @Metrics interceptor, though. There's currently a question and debate in #215 why @Metrics is an interceptor and whether the @InterceptorBinding should be kept. Given a comment from @mkouba there was a meeting in December 2017 where a decision has been made to remove it.

Addendum: More Advanced Use Cases:

Sometimes it could also be interesting to pass the non-exceptional result of an invocation. Then something like onSuccess(MetricRegistry mr, T result) could be thought of but I guess generics and annotations can't be made statically analysable (I may be wrong). Another use case might be when a product owner wanted to know what kind of customers have been created. Then we could even think of not just passing return values to these custom metric types but also call parameter values. However, currently I can't imagine of a type-safe solution for these more powerful features. At least not any based on generics. Though, I don't know what could be done with the Java Annotation Processing API to instruct the compiler.


Edit: First draft meant to refer to MetricRegistry. Non-existing MetricService replaced.
Edit: Reference to a comment about a December 2017 meeting regarding interceptor binding

@pilhuhn
Copy link
Contributor

pilhuhn commented May 29, 2018

What about a change that implicitly works with labels

@Counted(name="bla")
@Counted(name="bla", onFail)
void myMethod() { ... }

which would then end up in a metric "bla" without a label for all invocations and another metric "bla" with an additional label "_reason=fail" that gets only bumped in the exception case.

This could also have an attribute 'success'

@Counted(name="bla", onSucces) 

which then creates a label "_reason=success" that only gets bumped when the method does not throw an exception.

Actually the attributes could also be exit=EXIT_ENUM with EXIT_ENUM being { ALL /* default*/ , SUCCESS, FAIL }

Btw: is this limited to @Counted(monotonic=true) (or the upcoming @HitCounted)? I don't think it makes too much sense for a Gauge, but for histograms and the like it could make sense too.

Also: should e.g @Counted(name="bla", onFail) include the ALL case and thus also always bump the counter for the total number of invocations?

EDIT: I did not get that you want to catch and count individual Exceptions thrown out of the called method.
In this case there could be a (different) label for each exception _exception=e.getClass().getName().

@pilhuhn
Copy link
Contributor

pilhuhn commented May 29, 2018

@about-code How do you envision dealing with Exceptions that wrap other exceptions (at arbitrarily deep levels) ?

@about-code
Copy link
Author

@pilhuhn Thank you for your comments and thoughts.

What about a change that implicitly works with labels

I think compared to the interface I described earlier your attribute-based approach would probably be almost as equally expressive. It would allow for counting things differently based on the coarse-grained result (success/failure).

The exception case is interesting, though, and could be a differentiator.

I did not get that you want to catch and count individual Exceptions thrown out of the called method.
In this case there could be a (different) label for each exception _exception=e.getClass().getName()
[...]
How do you envision dealing with Exceptions that wrap other exceptions (at arbitrarily deep levels) ?

If your proposal is to let an MP @Counter metric generate the _exception label on failure then I don't see how I could answer your question. Or in case your proposal is that an application should generate the label imperatively within a business method when it catches the exception then this is something I'd like to avoid to not leak metrics logic into business logic.

With the interface above I'd imagine an app developer to naively implement the onAfterError callback in a way such that it traverses Exception#getCause(). The implementation could then generate a label for each exception on the way if this is what the application needs (and if this is possible with the metrics API). Being interested in arbitrary deep exceptions might not be a very common use case. I think it could be more likely to find applications which wrap their specific exceptions into some generic MyFooApplicationException. These applications could be interested in the immediate cause, at least. An MP-metrics can't guess these specifics, though. So an interface-based programming model could have the advantage of being open to an implementation by the application itself at the price of being more boilerplate compared to your approach.

@about-code
Copy link
Author

about-code commented May 29, 2018

Return Values and Call Parameters for Application Metrics?

In my interface proposal above I intentionally skipped most of the stuff which would be even more interesting from a business perspective than just success or failure or particular exception types (Please let me know when I should open another issue if you feel this is getting off topic).

Example: Deriving Metrics from returned values

Imagine we extend above interface to something like

public interface CustomMetric<T_RESULT> {
    public void onBeforeEach(MetricRegistry mr);
    public void onAfterEach(MetricRegistry mr, T_RESULT result);
    public void onAfterSuccess(MetricRegistry mr, T_RESULT result);
    public void onAfterError(MetricRegistry mr, Exception e);
}

where T_RESULT is the return type of a method. Then an interceptor could also pass the result of a successful computation. An application providing the implementation could then create arbitrary labels based on the result value using the imperative MP-Metrics API.

An interface with a T_RESULT only makes sense to be used with method annotations. @Metrics as proposed earlier might turn out to be not such a good choice as an interceptor because it's also applicable to other element types. So probably a new @CustomMetric or @AppMetric annotation with an interceptor binding could make sense. Then @Metrics didn't need to remain an interceptor (see #215). An application interested to provide business metrics about the most frequently read customer records could then, for example, write:

@Metrics
@CustomMetric(type=MyPopularCustomersMetric.class)`
public Customer readCustomer(int custId) {...}

The MyPopularCustomersMetric.class were an implementation of CustomMetric<Customer> and would be able to inspect the customer instance returned by the annotated method in its onAfterEach or onAfterSuccess callback, It would use the MP-Metrics API to add metrics about the customer. From a perspective of type-safety the only problem I see is that it doesn't seem to be possible to avoid people from using some CustomMetric<Customer> on a method whose return type isn't Customer:

@Metrics
@CustomMetric(type=MyPopularCustomersMetric.class)`
public Dogs readDogs(int animalId) {...}

We'd probably need an annotation processor for @CustomMetric which compares the generic parameter T_RESULTof a CustomMetric#type to the return type of the method annotated with @CustomMetric.

Example: Deriving Metrics from method parameter values

To answer questions like when do customers of a particular age attempt to sign up an application might want to derive metrics from Customer entites passed into a registerCustomer method:

Example: Find out which time of the day new customers do sign up

@Metrics
@CustomMetric()`
public void registerCustomer(
     @MetricParam(type=RegistrationsByTimeOfDay.class) Customer cust) {
      // ....
}

The interface expected by @MetricParam#type could look like:

public interface CustomMetricParam<T_PARAM, T_RESULT> {
    public void onBeforeEach(MetricRegistry mr, T_PARAM param);
    public void onAfterEach(MetricRegistry mr, T_PARAM param, T_RESULT result);
    public void onAfterSuccess(MetricRegistry mr, T_PARAM param, T_RESULT result);
    public void onAfterError(MetricRegistry mr, T_PARAM param, Exception e);
}

and RegistrationsByTimeOfDay.class is an implementation of CustomMetricParam<Customer, Void>. The class could inspect the customer in its onBeforeEach or onAfterEach callback and correlate the customer's age with the current hour of the day. Once again the caveat is that I don't know how to statically detect a misuse like

@Metrics
@CustomMetric()`
public void registerCustomer (
  @MetricParam(type=RegistrationsByTimeOfDay.class) Dog cust /* 'Dog' but expected 'Customer' */
) {
      // ....
}

I am not sure whether these ideas align with the goals of MP-Metrics spec. They are naive proposals intended to allow a bit more flexibility with respect to implementing business metrics. I think I could implement most of this also for my application as a supplement to MP-metrics. Just let me know if you consider it too domain-specific to be further discussed for the spec. Thank you.

--
Edit: Better samples and business cases. Inconsistencies fixed.

@donbourne
Copy link
Member

Ideas discussed on 05/29 call:

Give annotations ability to track success/failure separately:

  • On annotation types we could have an annotation to indicate that you want to separately track failure cases (eg. @Timed(trackFailuresSeparately=true)).
  • If an annotation had the indicator tor track failures separately, it would need to create metrics for the success case and failure case.
  • Failure case would be when a method throws an exception.
  • This could apply to @Timed, @Counted, @Metered.

Another approach to counting separately:

@Counted(name=x, outcome=success)

// adds exception tag with exception classname
@Counted(name=x, outcome=failure)  

// default
@Counted(name=x, outcome=all)

Dropwizard also has the ExceptionMetered annotation https://github.com/dropwizard/metrics/blob/4.1-development/metrics-annotation/src/main/java/com/codahale/metrics/annotation/ExceptionMetered.java

@velias
Copy link

velias commented Nov 2, 2018

This is IMO very useful feature, I'm just looking for it also. But saying that each exception is a failure is not a very good solution I think. For example you can have exception indicating data error, and other which indicates system error. And you can count only system errors (eg now I need to count only system errors when I call remote service from my service, but I do not want to count data validation errors returned from it).
So this feature should at least allow to exclude some named exceptions from being treated as failure.
Or do it similarly like EJB spec - treat unchecked/runtime exceptions as a failure, and checked exceptions as success.

@about-code
Copy link
Author

@velias Maybe it's just a matter of wording. Something like

// adds exception tag with exception classname
@Counted(name=x, outcome=exception)  

would leave it open whether the exception is a failure or not. As far as I understand things, the tag will enable counting exceptions individually by class name. So if a method throws ValidationExceptions and RuntimeExceptions, then there will be two tags: ValidationException with a count of e.g. three and RuntimeException with a count of e.g. two.

@t1
Copy link

t1 commented Oct 24, 2022

I think a specific ErrorRate metric would be much easier to understand and even more powerful than the generic CustomMetric suggested above.

interface ErrorRate {
    void addError();
    void addSuccess();
    ErrorRateSnapshot getSnapshot();
}

@interface ErrorRate {
    /**
     * The name of a method that takes a throwable and determines if it's actually an error.
     * By default, all exceptions are considered to be an error.
     */
    String isError() default "";
    /**
     * The name of a method that takes a response object and determines if it's actually a success
     * By default, all response objects are considered to be a success.
     */
    String isSuccess() default "";
    ErrorRateSnapshot getSnapshot();
}

public interface ErrorRateSnapshot {
	int getErrors();

	int getSuccesses();

	default double getErrorRate() {
		return ((double) getErrors()) / getCount();
	}

	default int getCount() {
		return getErrors() + getSuccesses();
	}
}

WDYT? I could add a PR here and maybe to https://github.com/smallrye/smallrye-metrics

@donbourne
Copy link
Member

@t1 , looking at the Fault Tolerance spec, we might want to differentiate successes and failures in a manner similar to @Fallback annotations. See https://microprofile.io/project/eclipse/microprofile-fault-tolerance/spec/src/main/asciidoc/fallback.asciidoc .

In particular, the Fallback annotation has a skipOn attribute (exceptions that should not trigger the fallback) and an "applyOn" attribute (exceptions that should trigger the fallback).

Using a similar idea for metrics annotations could give us a consistent way to indicate which thrown exceptions should be treated (and counted/timed) as failures.

For example, someone could annotate a method with a @Counted annotation as follows:

// count successes and failures in separately tagged counters
// treat any throwables as failures
@Counted(name="x", outcome=all)

// count successes and failures in separately tagged counters
// treat ExceptionA as failure and all other throwables as success
@Counted(name="x", outcome=all, failure=ExceptionA.class)

// count successes and failures in separately tagged counters
// treat ExceptionB, ExceptionC as success, and any other throwables as failures
@Counted(name="x", outcome=all, success={ExceptionB.class,ExceptionC.class})

The ErrorRate interface would be more flexible than what I've described (in particular giving full control to evaluate returned objects for success or failure), but I'm not sure we should make the metrics annotations be more elaborate than the fault tolerance annotations when it comes to determining method success/failure. If needed, developers can always fall back to using the non-annotation API for arbitrarily complex cases.

I also kind of like the alternate suggestion from @about-code for its simplicity:

// adds exception tag with exception classname
@Counted(name=x, outcome=exception)  

@t1
Copy link

t1 commented Nov 3, 2022

@donbourne: I have created this issue in SmallRye Metrics including a suggestion for a @ErrorRated annotation that looks like this:

@Inherited
@Documented
@InterceptorBinding
@Retention(RUNTIME)
@Target({TYPE, CONSTRUCTOR, METHOD, ANNOTATION_TYPE})
public @interface ErrorRated {

	/**
	 * Create a default class so the value is not required to be set all the time.
	 */
	class DEFAULT implements ErrorRateHandler<Object> {
		@Override public boolean isError(Object value) {
			return false;
		}

		@Override public boolean isError(Throwable throwable) {
			return true;
		}
	}

	/**
	 * Specify the error rate handler class to be used. The type parameter of the fallback class must be assignable to the
	 * return type of the annotated method.
	 *
	 * @see #applyOn()
	 * @see #skipOn()
	 */
	@Nonbinding
	Class<? extends ErrorRateHandler<?>> value() default DEFAULT.class;

	/**
	 * The name of the meter.
	 */
	@Nonbinding
	String name() default "";

	/**
	 * The tags of the meter. Each {@code String} tag must be in the form of 'key=value'. If the input is empty or does
	 * not contain a '=' sign, the entry is ignored.
	 *
	 * @see org.eclipse.microprofile.metrics.Metadata
	 */
	@Nonbinding
	String[] tags() default {};

	/**
	 * If {@code true}, use the given name as an absolute name. If {@code false} (default), use the given name
	 * relative to the annotated class. When annotating a class, this must be {@code false}.
	 */
	@Nonbinding
	boolean absolute() default false;

	/**
	 * The display name of the meter.
	 *
	 * @see org.eclipse.microprofile.metrics.Metadata
	 */
	@Nonbinding
	String displayName() default "";

	/**
	 * The description of the meter.
	 *
	 * @see org.eclipse.microprofile.metrics.Metadata
	 */
	@Nonbinding
	String description() default "";

	/**
	 * The list of exception types which should be considered errors, including subclasses.
	 * <p>
	 * Only if an exception is <em>not</em> in this list, the {@link ErrorRateHandler} is considered.
	 *
	 * @see #value()
	 */
	@Nonbinding
	Class<? extends Throwable>[] applyOn() default {};

	/**
	 * The list of exception types which should <em>not</em> be considered errors, including subclasses.
	 * <p>
	 * Only if an exception is <em>not</em> in this list, the {@link ErrorRateHandler} is considered.
	 *
	 * @see #value()
	 */
	@Nonbinding
	Class<? extends Throwable>[] skipOn() default {};
}

This would give us all the flexibility with fully dynamic ErrorRateHandler classes or simpler applyOn and skipOn lists of exceptions.

I think a simple counter is not enough, as we need to count successes and failures, and we need snapshots, so we have buckets of failure rates.

@donbourne
Copy link
Member

@t1, a few thoughts on your suggested @ErrorRated annotation...

  • I think you're suggesting @ErrorRated would be its own metric type (though I could be misinterpreting). I would say that we want Counters and Timers to be able to separately track successes / failures, rather than have a separate annotation just for that. By incorporating success/failure tracking into Timers, for example, you'd be able to time how long things take when they work vs when they fail.

  • Tracking of rates is something we leave to the monitoring tools. For example, Prometheus lets you determine rates from a counter (eg. rate(someCounter[someDuration])). It's not that we couldn't track rates within the runtime, it's that it's better to track rates in the monitoring tool, since monitoring tools can let you say things like "give me the rate of increase of X over the last N minutes" or "give me the rate of increase of X from yesterday from 2pm-3pm" -- whereas rate counting in the runtime isn't nearly as flexible, and burns cpu in the runtime. Having someCount(success=true) and someCount(success=false) would be sufficient for monitoring tools to then be able to give you any failure rates you need over any time period.

  • applyOn / skipOn attribute names probably don't quite fit in this context. For Fallbacks, it means "apply/skip the fallback handler on ...". For counting, we're not applying/skipping anything, we're just deciding whether to count something as a success or a failure.

  • I like your isError approach -- though I think that if we were going to add that we should have something equivalent in Fault Tolerance to have some consistency. Just to restate, I don't think it makes sense for Metrics to be smarter about evaluating method success/failure than Fault Tolerance.

@about-code
Copy link
Author

about-code commented Nov 5, 2022

I also kind of like the alternate suggestion from @about-code for its simplicity:

// adds exception tag with exception classname
@Counted(name=x, outcome=exception)  

Just to add: I consider using the exception class name as tag name to be a default, only. Of course, it is prone to breaking metric history in case of renaming an exception class. So my recommendation for production would still be to consider using @Counted with custom tags. However, I also feel this is obvious to everyone, so just for having it spelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants