Hello, this small project was created to improve my Python skills by automating a daily task. As you can see, my code and skills still need improvement, but I'm happy to have achieved the goal of this project. I've included a short list of tasks to complete in the main file for potential improvements, but I'm not working on it for the moment as I want to focus on another project.
The different scrapers were developed in the following order:
- Jobup
- Glassdoor
- Indeed
As you can see, each code snippet is quite different. I used Playwright, , various SeleniumBase features, and, of course, BeautifulSoup. My conclusion regarding these different frameworks is that I highly recommend using SeleniumBase for extracting data from these types of websites, as it performs best at bypassing bot detection (CDP mode) and for data extraction in general. Playwright is more useful if you want to run automated tests on a website you've developed. I haven't modified my LinkedIn and Jobup scrapers to use SeleniumBase's CDP mode to demonstrate how it works with other frameworks, but if I were to recreate them, I would use this mode. Why not use SeleniumBase's UC mode? This is because UC mode has been deprecated to bypass bot detection and other issues, but it still works correctly on some websites, such as LinkedIn.
It's important to note that this code currently works, but because websites are constantly being updated, it might stop working. You'll need to modify it to adapt to updates. Another issue arises when performing rapid and intensive data extractions from certain websites (Indeed and Glassdoor) from the same IP address (so use a proxy if you want to perform rapid extractions). You might then be censored, or your IP address could be blocked. The code may also generate errors. It's possible I haven't accounted for all the issues when extracting data from these websites. I'll do my best to fix them as I identify them and use the code. Please excuse me in advance for any problems you may encounter.
I used the JetBrains PyCharm IDE on Windows 10 to run and debug this code. I haven't tested it with other IDEs, so I don't know if it works everywhere. To extract the data, Chrome must be installed on your computer.
pip install playwright
playwright install
pip install beautifulsoup4
pip install seleniumbase
pip install panda
pip install xlsxwriter
Once the installation is complete, you will simply need to modify the main file according to your needs. Here is the first part to modify in main.py:
linkedin_dict = linkedin_scraper("https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Software+Engineer+OR+Embedded+Engineer&location=Switzerland&f_TPR=r86400")
jobup_dict = jobup_scraper("https://www.jobup.ch/fr/emplois/?publication-date=1&term=software%20engineer")
glassdoor_dict = glassdoor_scraper("https://fr.glassdoor.ch/Emploi/software-engineer-emplois-SRCH_KO0,17.htm?fromAge=7")
indeed_dict = indeed_scraper("https://ch-fr.indeed.com/jobs?q=ing%C3%A9nieur+informatique&l=&fromage=7")
all_job_dict = linkedin_dict | jobup_dict | glassdoor_dict | indeed_dictYou'll first need to find the correct URL for each website, except for LinkedIn. Simply go to the job search site (you can use the link already in the code and modify the search terms). Find your job posting, apply the various filters, and then copy and paste the link into the function call corresponding to the website. For LinkedIn, I'm using the public API; therefore, you'll need to enter the URL yourself. You can find all the necessary information here
If you don't want to use certain websites, you'll need to comment out the line calling the function and remove *dict from all_job_dict. Here's an example without Jobup or Glassdoor:
linkedin_dict = linkedin_scraper("https://www.linkedin.com/jobs-guest/jobs/api/seeMoreJobPostings/search?keywords=Software+Engineer+OR+Embedded+Engineer&location=Switzerland&f_TPR=r86400")
# jobup_dict = jobup_scraper("https://www.jobup.ch/fr/emplois/?publication-date=1&term=software%20engineer")
# glassdoor_dict = glassdoor_scraper("https://fr.glassdoor.ch/Emploi/software-engineer-emplois-SRCH_KO0,17.htm?fromAge=7")
indeed_dict = indeed_scraper("https://ch-fr.indeed.com/jobs?q=ing%C3%A9nieur+informatique&l=&fromage=7")
all_job_dict = linkedin_dict | indeed_dictNow that you've set up the website for analysis, you can filter the job postings that interest you using keywords. These keywords allow you to exclude postings that don't contain them in their description, and you'll get an Excel file like this: one sheet containing the unfiltered postings and another containing the filtered postings.

To configure your keyword, you will need to modify the filters array in main.py. Here is an example below:
filters = ["python", "vhdl", r"\Wc\W", "linux", "IOT", "systemverilog"]Keywords are not case-sensitive. You can add as many as you like. If a keyword contains only one character, like the "c" in my example, you must add a \W before it and the prefix "r" to the end of the string, exactly as shown in the example.
You can now run the code. I hope you find this useful.