Cover Image for Web Scraping with Python and Beautiful Soup

Web Scraping with Python and Beautiful Soup

Hosted by Jeroen Janssens
 
 
Zoom
Registration
Past Event
Welcome! To join the event, please register below.
About Event

Join me in this free, live master class and...

Let’s turn that messy HTML into a structured data set!

The internet is not just a collection of web pages, it’s a gigantic resource of interesting data. Being able to extract that data is a valuable skill. It’s certainly challenging, but with the right knowledge and tools, you’ll be able to leverage a wealth of information for your personal and professional projects.

Imagine building a web scraper that legally gathers information about potential houses to buy, a process that automatically fills in that tedious form to download a report, or a crawler that enriches an existing data set with weather information. In this live masterclass I’ll show you how to accomplish just that using Python, Beautiful Soup, and a handful of other packages.

What you’ll learn in this master class

  • The challenge of scraping messy HTML.

  • The structure of an HTTP request.

  • How to target HTML elements and attributes using CSS selectors.

  • The difference between a static and a dynamic website.

  • How to extract data from a static website using beautifulsoup4.

  • Resources to continue learning about this exciting topic.

Prerequisites

Some experience with Python is helpful but definitely not required. Even if you've never programmed before, you'll still be able to get some valuable lessons out of this master class. If you want to follow along, you need a recent version of Python together with the packages requests and beautifulsoup4. I might use a little bit of pandas to create a data set.

To inspect HTTP requests and HTML, I'll be using Firefox's developer tools. Most other browsers offer similar functionality, but if you want to follow along with me, I recommend you have Firefox ready.

Recording

If you can't make it then you can always watch the edited recording on YouTube. Register to get an email once it's available. With that said, I strongly recommend you join the session so you can ask questions directly to me and join the breakout discussions.

About Jeroen

Jeroen Janssens, PhD, is a data science consultant and certified instructor. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about helping and teaching others to do such things.

Since 2013, Jeroen runs Data Science Workshops, a training and coaching firm that organizes open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. Clients include Amazon, eHealth Africa, Schiphol Airport, The New York Times, and T-Mobile.

Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. He is the author of Data Science at the Command Line (O’Reilly Media, 2021). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University.

He lives with his wife and two kids in Rotterdam, the Netherlands. If you would like to know more about his services, fees, and availability, then please email Jeroen. You can also find him on Twitter, GitHub, and LinkedIn.

One more thing...

On May 17, I'm giving another free, live master class about Data Visualization with Python and Plotnine. Useful if you want to learn to get insight into that hard-won data set!