Nice explanation.
I have worked on crack detection but the images were not this complex.
The images were pretty much straight forward where cracks were closely captured. For that dataset, I implemented image processing and was able to achieve about 96% accuracy.
https://medium.datadriveninvestor.com/building-collapsed-7543580ba698
For the dataset you have mentioned, image processing won't work.
Kudos!!