Shall I use ordinal encoding or One-Hot-Encoding when using DBSCAN for content clustering on websites?
I want to cluster the preparation steps on cooking recipes websites in one cluster so I can distinguish them from the rest of the website.
To achieve this I extracted for each text node of the website the DOM path (e.g. body-div-div-table-tr ....) and did a One-Hot-Encoding before I executed the DBSCAN clustering algorithm.
My hope was, that the DBSCAN algorithm recognizes also not only 100% identical DOM-paths as 1 common cluster, because sometimes one preparation step is e.g. in a tag and the others are not. But even though I tried a lot to vary epsilon and MinPoints paramters, it does not recognice then all as one cluster.
My question: Is One-Hot-encoding maybe the wrong way, because a dom path is not really 100% categorical but maybe a kind of ordinal? Because the more common DOM path elements two DOM-paths have the more likely is, that they are building one common cluster.