Crawling & Extracting English-Punjabi Parallel Data

Tools to power your life!

Crawl & Extract Parallel Data

Developed by Manpreet Singh Lehal
(under the supervision of Dr. Vishal Goyal & Dr. Ajit Kumar)

Crawler

Just enter the link/url of the site and our tool will remove unwanted characters & separate the data of both Punjabi & English languages. It also extracts all the links from the entered link in dropdown where you can crawl again.

View

Extractor

This tool gives out the aligned data of English and Punjabi languages based on the given input. You can also select the refinement scale of the output in the Tuning dropdown.

View

All-In-One

This is a combination of the crawler, extractor and the cleaner where the user need not give sentences as input. Enter in the link/url and hit submit and get a clean and aligned data as an output along with two separate outputs of English and Punjabi.

View