Method and system for combining video sequences with spatio-temporal alignment