Find business vertical of a website just by its URL or cluster similar website by its url
I have been exploring this problem a lot about just using the website url to tag or cluster them as per their business domain. For example:
amazon.com = e-commerce
bbc.co.uk = news
Adidas.com = sports apparel
I have read through some research papers which try to cluster using different unsupervised learning clustering algorithm like CLUE link here
One way to think is to create a repository of labeled websites and then create a model to tag similar websites using this model but does not seem to be faesible option.
Any idea here to tackle this problem would be helpful. Even pointing out some python libraries which can ping the website url and pop the result of its business vertical would also help which I can use to label the site domains