RecurrentActorCriticPolicy Behaviour Not Clear #246

pasinit · 2024-05-09T10:50:17Z

📚 Documentation

I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as input to the LSTM (and optional starting hidden states). From what I am seeing, however, the LSTM implemented basically allows only to feed sequence of length 1

stable-baselines3-contrib/sb3_contrib/common/recurrent/policies.py

Line 198 in 25b4326

for features, episode_start in zip_strict(features_sequence, episode_starts):

In fact, by zipping features_sequence (with shape [seq_len, n_envs, feature_dims]) and episode_starts (with shape [n_envs, -1]), in the case of 1 environment, we only allow seq_len to be 1.

Is this intended and am I reading this correctly? Is the logic behind that since we keep propagating the state we are still happy with sequences of length 1?

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation

The text was updated successfully, but these errors were encountered:

araffin · 2024-05-10T13:54:28Z

tensors of the shape (batch_size, seq_len, feature_dim) as input to the LSTM (

that's correct

I think you missed:

stable-baselines3-contrib/sb3_contrib/common/recurrent/policies.py

Lines 189 to 194 in 25b4326

    
           # If we don't have to reset the state in the middle of a sequence 
        
           # we can avoid the for loop, which speeds up things 
        
           if th.all(episode_starts == 0.0): 
        
               lstm_output, lstm_states = lstm(features_sequence, lstm_states) 
        
               lstm_output = th.flatten(lstm_output.transpose(0, 1), start_dim=0, end_dim=1) 
        
               return lstm_output, lstm_states

here we pass a full sequence as input.

and for the rest, we unroll the sequence manually because we need to reset the state of the lstm when a new episode starts:

stable-baselines3-contrib/sb3_contrib/common/recurrent/policies.py

Lines 202 to 204 in 25b4326

    
           # Reset the states at the beginning of a new episode 
        
           (1.0 - episode_start).view(1, n_seq, 1) * lstm_states[0], 
        
           (1.0 - episode_start).view(1, n_seq, 1) * lstm_states[1],

pasinit added the documentation Improvements or additions to documentation label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RecurrentActorCriticPolicy Behaviour Not Clear #246

RecurrentActorCriticPolicy Behaviour Not Clear #246

pasinit commented May 9, 2024

araffin commented May 10, 2024

RecurrentActorCriticPolicy Behaviour Not Clear #246

RecurrentActorCriticPolicy Behaviour Not Clear #246

Comments

pasinit commented May 9, 2024

📚 Documentation

Checklist

araffin commented May 10, 2024