Hidden state dimensions in Pytorch LSTM
Please read the question completely before you mark it as duplicate
I was trying to understand the syntax of using an LSTM in PyTorch. I came across the following in PyTorch docs.
h_0
: tensor of shape $(D * \text{num_layers}, N, H_{out})$ containing the initial hidden state for each element in the batch. Defaults to zeros if(h_0, c_0)
is not provided. where:\begin{aligned} N ={} \text{batch size} \\ L ={} \text{sequence length} \\ D ={} 2 \text{ if bidirectional=True otherwise } 1 \\ H_{in} ={} \text{input_size} \\ H_{cell} ={} \text{hidden_size} \\ H_{out} ={} \text{proj_size if } \text{proj_size}0 \text{ otherwise hidden_size} \\ \end{aligned}
I did not understand why each sequence of the batch has different initial hidden state. Does each sequence in the batch have its own LSTM cell and they're learned separately? Let's say I have a long paragraph of text, so should I split it into batches of sentences? What are the best practices when splitting text data into mini-batches?
Topic mini-batch-gradient-descent pytorch lstm
Category Data Science