什么是Activation Function？

改为英文：An activation function in neural networks defines the output of a node given its inputs and a weight vector, introducing non-linearity that allows networks to learn complex mappings between inputs and outputs. Common examples include the sigmoid function, which squashes inputs to the (0,1) range but may suffer from vanishing gradients; the hyperbolic tangent (tanh) function, which outputs values in (−1,1) and centers data but still has gradient decay issues; and the Rectified Linear Unit (ReLU), which outputs zero for negative inputs and the identity for positive inputs, enabling sparse activation and faster convergence in deep networks 维基百科. Advanced variants such as Leaky ReLU, Parametric ReLU, and Exponential Linear Units (ELUs) address ReLU’s “dying neuron” problem by allowing a small, non-zero gradient for negative inputs, thereby improving learning stability and model robustness 维基百科. The choice and parameterization of activation functions significantly impact network training dynamics, convergence speed, and final performance, making it a critical design decision in deep learning architectures.