New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add documentation to burn core nn #1746
base: main
Are you sure you want to change the base?
Add documentation to burn core nn #1746
Conversation
Added links between the struct and the config. Added a link to the related burn_tensor function in the documentation for the forward function.
Removing the formula for relu from the module API to the functional API, citing a paper relevant to relu and mentionning the functional API in the module API
Adding documentation to the Linear module mentionning that LinearConfig struct should be used when creating a Linear Layer Also adding links to the documentation that points people toward the right path
Added links between the struct and the config. Added a link to the struct in the forward function for more info.
Adding documentation stating the RotaryEncoding should be created using a RotaryEncodingConfig
Adding documentation to the prelu module: - Linking forward function documentation to the functional API - Citing the first paper to mention prelu - Adding documentation saying that prelu layer should be created using PReluConfig
Added links for more info. Added shape info at some places.
Provide documentation for the Gru module, including its configuration and usage. Include a link to the paper that introduced the Gated Recurrent Unit (GRU) and specify that the module should be created using GruConfig. Also, mention that the forward function returns a state tensor with specific dimensions.
Adding documentation: - Says to use config to create the layers - Add mathematical formula to the pwff forward pass - Add citation in the pwff to the "Attention is all you need" paper
Provide documentation for the Lstm and BiLstm modules, including their configurations and usage. Include links to the papers that introduced Long Short-Term Memory (LSTM) and Bidirectional LSTM. Specify that the modules should be created using LstmConfig and BiLstmConfig respectively.
Adding documentation stating to use the config to create the layer
…odules Added references to the burn_tensor associated functions. Added links between the struct and the config.
…m/Les-Informatichiens/burn into add-documentation-to-burn-core-nn
…m/Les-Informatichiens/burn into add-documentation-to-burn-core-nn
Added references to the burn_tensor associated functions. Added links between the struct and the config.
…m/Les-Informatichiens/burn into add-documentation-to-burn-core-nn
Added references to the burn_tensor associated functions. Added links between the struct and the config.
Added references to the burn_tensor associated functions. Added links between the struct and the config. Removed the backend generic from the config since it's not needed (might be a breaking change).
…m/Les-Informatichiens/burn into add-documentation-to-burn-core-nn
Changed burn_tensor imports to crate::tensor
Changed burn_tensor imports to crate::tensor
…ig initialization
…m/Les-Informatichiens/burn into add-documentation-to-burn-core-nn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the major clean-up 🧹 🙏
Some minor comments to address, mainly about the form to be consistent in the changes.
// Shape `[batch_size, seq_length_1, d_model]` | ||
query: Tensor<B, 3>, | ||
// Shape `[batch_size, seq_length_2, d_model]` | ||
key: Tensor<B, 3>, | ||
// Shape `[batch_size, seq_length_2, d_model]` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To document struct fields we use a line doc (three forward slashes ///
).
use super::checks; | ||
|
||
/// Configuration to create an [1D convolution](Conv1d) layer. | ||
/// Configuration to create an [1D convolution](Conv1d) layer using the [init function](Conv1dConfig::init). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already there before.. but if we're fixing the docs it should be "to create a 1D convolution"
|
||
/// Configuration to create an [2D convolution](Conv2d) layer. | ||
/// Configuration to create an [2D convolution](Conv2d) layer, using the [init function](Conv2dConfig::init). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with the conv1d: should be "to create a 2D convolution".
Also, there's no need for the comma in ", using the init function."
#[derive(Module, Debug)] | ||
pub struct LeakyRelu<B: Backend> { | ||
/// | ||
/// Should be created with [LeakyReluConfig](LeakyReluConfig). | ||
#[derive(Module, Clone, Debug)] | ||
pub struct LeakyRelu { | ||
/// The negative slope. | ||
pub negative_slope: f64, | ||
phantom: PhantomData<B>, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 👌
/// - `X` is the input tensor | ||
/// - `norm` is the normalization function | ||
/// - `γ` is the learnable weight | ||
/// - `β` is the learnable bias | ||
/// - `Y` is the output tensor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation purposes maybe have X
and Y
together, either at the end or top of the enumeration:
X
is the input tensorY
is the output tensornorm
is the normalization functionγ
is the learnable weightβ
is the learnable bias
/// Configuration to create an [2D max pooling](MaxPool2d) layer. | ||
use crate::tensor::module::max_pool2d; | ||
|
||
/// Configuration to create an [2D max pooling](MaxPool2d) layer using the [init function](MaxPool2dConfig::init). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was already there, but should be: "to create a 2D max pooling"..
|
||
#[cfg(not(feature = "std"))] | ||
use num_traits::Float; | ||
|
||
/// Configuration to create an [PositionalEncoding](PositionalEncoding) layer. | ||
/// Configuration to create an [PositionalEncoding](PositionalEncoding) layer using the [init function](PositionalEncodingConfig::init). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment. Should be: "to create a PositionalEncoding"..
@@ -186,6 +190,10 @@ pub struct BiLstmConfig { | |||
} | |||
|
|||
/// The BiLstm module. This implementation is for Bidirectional LSTM. | |||
/// | |||
/// Introduced in the paper: [Framewise phoneme classification with bidirectional LSTM and other neural network architectures](https://www.cs.toronto.edu/~graves/ijcnn_2005.pdf). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice literature review 😄 I wasn't aware of the origins for bidirectional LSTM
@@ -203,7 +205,7 @@ impl<B: Backend> TransformerDecoderLayer<B> { | |||
norm_first: config.norm_first, | |||
} | |||
} | |||
|
|||
/// forward pass for the TransformerDecoder layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep a similar capitalization scheme for the line docs.
Suggestion:
Applies the TransformerDecorder forward pass to the input tensor.
/// | ||
/// # Params | ||
/// | ||
/// - linear inner: Linear layer with `d_model` input features and `d_ff` output features. | ||
/// - linear outer: Linear layer with `d_ff` input features and `d_model` output features. | ||
/// | ||
/// FFN(x) = max(0, xW1 + b1)W2 + b2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the math in ticks?
FFN(x) = max(0, xW1 + b1)W2 + b2
Pull Request Template
Checklist
run-checks all
script has been executed.Related Issues/PRs
Related to #594
Changes
Revised rust documentation for burn core neural network modules.