Tech stack:
Wikipedia
Wikipedia
Selenium
Selenium
Chromium Webdriver
Chromium Webdriver
Pandas
Pandas
Story
During COVID-19 pandemic, in GitHub meanders I discovered a code repository handling a problem of Polish names declension. It looked quite outdated – out of creative boredom I decided to update it and put my web-scraping skills to the real test.
Solution
My priority was being responsible in my web-scraping efforts, realized by a Selenium Chromium webdriver on my local machine. Finally, Wikipedia did not scream any protest, and politely returned updated declension of more names than previously.
Example names
-
- Female Polish names on Wikipedia: click here,
- Male Polish names on Wikipedia: click here.
Declension rules
Lesson Learned
In the end, this simple programming exercise turned out to be a fun challenge. I enjoyed it a lot – and learned a lot about the intricacies of the Polish Language, which, not-surprisingly, is my mother tongue.
Statistics
- 2: data sources
- 148: additional missings,
- 577: Polish names original dataset
- 687: new Polish names,
- 1116: Polish names updated dataset:
- 570: female,
- 801: male.