When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn’t the predicted/decoded embeddings also be positionally encoded?
You must log in or register to comment.
When training a transformer on positionally encoded embeddings, should the tgt output embeddings also be positionally encoded? If so, wouldn’t the predicted/decoded embeddings also be positionally encoded?