[WIP] Adaptive activation functions #497

rbSparky · 2022-03-19T21:02:17Z

Work-in-progress for #355

ChrisRackauckas · 2022-03-20T11:03:05Z

src/pinns_pde_solve.jl

@@ -99,6 +99,7 @@ struct PhysicsInformedNN{isinplace,C,T,P,PH,DER,PE,AL,ADA,LOG,K} <: AbstractPINN
  param_estim::PE
  additional_loss::AL
  adaptive_loss::ADA
+  adaptive_activation_function::ADF


I don't think it needs to be added here. I think it just needs to be a property of the chain and added to the chain's weights list?

Take a look at how the Flux functors are built: https://fluxml.ai/Flux.jl/stable/models/functors/

I think it just needs to be a property of the chain

Could you elaborate a bit on how that would be done?

I looked into how Flux functors are built and how they are used, would it be implemented like this? It isn't exactly clear to me.

One way you might implement this idea in Flux is by generating a Flux.Chain for the entire network that is a sequence of three building blocks, repeated:

Flux.Chain( Dense(in, out, σ=identity; bias=true, init=nn_param_init), AdaptiveActivation(n, a), NonlinearActivation(nonlinearity), )

The entire network would be those three layers repeated for how many hidden layers there are. Then you'd have to write a struct for AdaptiveActivation which would simply multiply the appropriate input by the n a current value in an elementwise fashion (very similar to the Diagonal implementation in Flux), and a NonlinearActivation struct which would have no trainable parameters but simply apply desired nonlinearity after the adaptive activation. Basically in total you'd be recreating the Dense layer but with an extra elementwise operation in the middle.

This part of the Flux code would be good reference, since the building blocks you'd want to implement are similar to Dense and Diagonal, and you can see how the @functor macro gets used.
https://github.com/FluxML/Flux.jl/blob/ef04fda844ea05c45d8cc53448e3b25513a77617/src/layers/basic.jl#L82-L122

Interestingly the first paper's version is the hardest to implement here because you have to make sure that each of the AdaptiveActivations are utilizing the same value for a (this is known as weight tying or weight sharing). That would go under the kind of stuff that is covered in this doc for Flux:
https://fluxml.ai/Flux.jl/stable/models/advanced/

zoemcc · 2022-03-22T02:30:53Z

I agree with Chris that I think most of that work should not go here as it is mostly neural architecture related. I think it could make sense to add to the DiffEqFlux repository as was mentioned in the issue for this method. Alternatively we could make a new file in this repo such as networks.jl and have it go there along with other neural architectures that are implemented with PINNs in mind. If you want to include the adaptive activation function in the first paper referenced https://arxiv.org/pdf/1906.01170 and also the second paper https://arxiv.org/pdf/1909.12228 , then I think it would make sense to write a single function such as

function AdaptiveActivationFeedForwardNetwork(hyperparameters...)

  function slope_recovery_loss_func(phi, θ, p)
    # calculate the slope_recovery loss function here as a function of the θ parameters that are generated for this network
    return regularizer_loss
  end

  return (network=FastChain(...), loss_func=slope_recovery_loss_func)
end

where you return a NamedTuple of either a DiffEqFlux.FastChain or a Flux.Chain, and the function that will compute the Slope Recovery loss from the second paper. Then the user would be able to pass in that network to the PINN and the Slope Recovery loss function to the additional_loss input to the PINN. If it ends up being a huge improvement to the overall function of the learning process then it might make sense to include it in the internals of the PINN but I think for now it makes sense to generate the network and feed in the Slope Recovery regularizer function through the external interface, and I don't think any changes in the internals of the PINN implementation will be required to implement these algorithms.

The hyperparameters you would want to include (as arguments to that function) are at least:

Number of hidden layers
Dimension of each hidden layer
Number of inputs and number of outputs
Which nonlinearity to apply after the scaling from the paper
Feedforward network hyperparameters such as initial parameter distributions
Which of the three different forms of adaptive activation to use
n value for the algorithm: initial a should be scaled such that n*a=1 for each a parameter

and possibly others that I didn't think of that will become apparent to you during implementation

zoemcc · 2022-03-22T02:33:03Z

Also I think there's an issue with your line ending commit style, and that's why almost every line has a change. Are you committing in Windows line-ending style? I believe we're using Unix line-ending style and having the two be different would result in almost every line being changed constantly (like what is being observed here).

I think it's an option in your git config settings.

rbSparky · 2022-03-22T03:25:24Z

Thanks for the pointers! I think I have a clearer idea of what to do now. I'll create a networks.jl in this repo and start working on that.

Also yes, I had Windows line-ending style for now, I'll change to Unix line-ending style for further commits.

zoemcc · 2022-03-22T17:02:19Z

Also, here's an example of using the additional_loss interface for including your own loss terms in the PINN optimization:

https://neuralpde.sciml.ai/stable/pinn/parm_estim/

rbSparky · 2022-04-04T13:32:38Z

I have re written the skeleton in a new file networks.jl and removed the previous changes made to pinns_pde_solve.jl.

I wanted to ask:

In the main function AdaptiveActivationFeedForwardNetwork we need the user to specify which type of adaptive function should be used. In what way should the parameter of the function for this purpose be written?

src/pinns_pde_solve.jl

ChrisRackauckas · 2022-04-06T01:36:22Z

src/networks.jl

+  layer = Flux.Chain(
+    Dense(in, out, σ=identity; bias=true, init=nn_param_init),
+    AdaptiveActivation(n, a),
+    NonlinearActivation(nonlinearity),
+  ) # to be stacked for as many hidden layers specified (N)


Is this actually needed, or is the AdaptiveActivation enough? I think those 8 lines are all that's really needed right? And that could just be added to Flux's activation function list?

zoemcc · 2022-04-29T03:57:19Z

I've been really busy with a project deadline on Tuesday, I should be able to do a thorough review and guide after that.

Initial skeleton

f49e9a3

rbSparky changed the title ~~Adaptive activation functions~~ [WIP] Adaptive activation functions Mar 19, 2022

ChrisRackauckas requested a review from zoemcc March 19, 2022 21:39

ChrisRackauckas reviewed Mar 20, 2022

View reviewed changes

New Skeleton + Unix Line Endings

7ae9f97

rbSparky force-pushed the adaptive-activation-functions branch from c697e42 to 7ae9f97 Compare April 4, 2022 13:16

Fixed wrong edit

b8dbdd9

ChrisRackauckas reviewed Apr 6, 2022

View reviewed changes

src/pinns_pde_solve.jl Outdated Show resolved Hide resolved

ChrisRackauckas reviewed Apr 6, 2022

View reviewed changes

Fixed trailing whitespaces

25b3e2a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Adaptive activation functions #497

[WIP] Adaptive activation functions #497

rbSparky commented Mar 19, 2022

ChrisRackauckas Mar 20, 2022

ChrisRackauckas Mar 20, 2022

rbSparky Mar 21, 2022

zoemcc Mar 22, 2022

zoemcc commented Mar 22, 2022 •

edited

Loading

zoemcc commented Mar 22, 2022 •

edited

Loading

rbSparky commented Mar 22, 2022

zoemcc commented Mar 22, 2022

rbSparky commented Apr 4, 2022 •

edited

Loading

ChrisRackauckas Apr 6, 2022

zoemcc commented Apr 29, 2022

[WIP] Adaptive activation functions #497

Are you sure you want to change the base?

[WIP] Adaptive activation functions #497

Conversation

rbSparky commented Mar 19, 2022

ChrisRackauckas Mar 20, 2022

Choose a reason for hiding this comment

ChrisRackauckas Mar 20, 2022

Choose a reason for hiding this comment

rbSparky Mar 21, 2022

Choose a reason for hiding this comment

zoemcc Mar 22, 2022

Choose a reason for hiding this comment

zoemcc commented Mar 22, 2022 • edited Loading

zoemcc commented Mar 22, 2022 • edited Loading

rbSparky commented Mar 22, 2022

zoemcc commented Mar 22, 2022

rbSparky commented Apr 4, 2022 • edited Loading

ChrisRackauckas Apr 6, 2022

Choose a reason for hiding this comment

zoemcc commented Apr 29, 2022

zoemcc commented Mar 22, 2022 •

edited

Loading

zoemcc commented Mar 22, 2022 •

edited

Loading

rbSparky commented Apr 4, 2022 •

edited

Loading