Skip to content

lqtri/WebPage-Segmentation--WPS-

Repository files navigation

WebPage-Segmentation--WPS-

Introduction

This is WPS-DB, our webpage segmentation method, different from other method like VIPS, Block-o-matic, we use DB-SCAN instead of K-mean for clustering our data.

Testing for Stack Overflow (Questions tab)

https://stackoverflow.com/questions

Testing for Stack Exchange

https://stackexchange.com

Testing on more pages (using Block-O-Matic's dataset)

Please visit this site to view the results:

https://drive.google.com/drive/folders/1uEAfsyFiR82Vejc26fgoWBLR1VpSaI-b?usp=sharing

Usage

  • Install independencies: pip install -r requirments.txt

  • Run WPS-DB:

    • Download our Jupyter Notebook and run your testing
    • Use command: python WPS_DB_Test.py <your webpage's url>
  • Check your Screenshots folder in the current work directory to see the segmentation layout.