Is there any length limit of the music with this method?

Nice to see contrastive learning used in music area, is there any length limit? Is it possible to get meaningful representation (for example, hundreds dimension vector) of song (few minutes long) with this method? Look forward for your reply, thanks a lot.