Firstly I would advise reducing the dimensions of these images.
For if applying a K-means can happen the problem of Course Of dimensionality that makes with algorithms that use distances between one point and another lose precision.
But I do not mean to literally reduce the size of the image but rather use a PCA or SVD for this as it will retain relevant image information.
There are other forms of clusters such as hierarchical clustering and Autoencouders that can be useful as well.
Another important point is the memory required to deal with this amount of images. depending on the algorithm and the amount of memory of your computer you can lock it.
There are more direct methods like comparing pedestrians of image A with image B. (but I do not think it is very good.)
At last there are several ways to do this type of clustering.