Home 5 Personal 5 Polish Names Declension, 2020

Tech stack:

Wikipedia

Wikipedia

Selenium

Selenium

Chromium Webdriver

Chromium Webdriver

Pandas

Pandas

Levenshtein Distance

Levenshtein Distance

Python

Python

Click to Expand section and show all tech used for this project

Story

During COVID-19 pandemic, in GitHub meanders I discovered a code repository handling a problem of Polish names declension. It looked quite outdated – out of creative boredom I decided to update it and put my web-scraping skills to the real test. 

 

Solution

My priority was being responsible in my web-scraping efforts, realized by a Selenium Chromium webdriver on my local machine. Finally, Wikipedia did not scream any protest, and politely returned updated declension of more names than previously.

 

Example names

 

Declension rules

 

 

Lesson Learned

In the end, this simple programming exercise turned out to be a fun challenge. I enjoyed it a lot – and learned a lot about the intricacies of the Polish Language, which, not-surprisingly, is my mother tongue.

        Statistics

        • 2: data sources
        • 148: additional missings,
        • 577: Polish names original dataset
        • 687: new Polish names,
        • 1116: Polish names updated dataset:
          • 570: female,
          • 801: male.