Discover new courses and celebrate learning with us today. 🎓

Explore

Development

Web Development

Scrapy: Powerful Web Scraping & Crawling with Python

4.9

315,475 rating
11 Lessons
229 Students
Last updated 3 months ago

By GoTrained Academy

via Udemy

Go To Course

About This Course

Python Scrapy Tutorial - Learn how to scrape websites and build a powerful web crawler using Scrapy, Splash and Python

Why this course?

Join the most popular course on Web Scraping with Scrapy, Selenium and Splash.
Learn from the a professional instructor, Lazar Telebak, full-time Web Scraping Consultant.
Apply real-world examples and practical projects of Web Scraping popular websites.
Get the most up-to-date course and the only course with 10+ hours of playable content.
Empower your knowledge with an active Q&A board to answer all your questions.
30 days money-back guarantee.

Scrapy is a free and open source web crawling framework, written in Python. Scrapy is useful for web scraping and extracting structured data which can be used for a wide range of useful applications, like data mining, information processing or historical archival. This Python Scrapy tutorial covers the fundamentals of Scrapy.

Web scraping is a technique for gathering data or information on web pages. You could revisit your favorite web site every time it updates for new information, or you could write a web scraper to have it do it for you!

Web crawling is usually the very first step of data research. Whether you are looking to obtain data from a website, track changes on the internet, or use a website API, web crawlers are a great way to get the data you need.

A web crawler, also known as web spider, is an application able to scan the World Wide Web and extract information in an automatic manner. While they have many components, web crawlers fundamentally use a simple process: download the raw data, process and extract it, and, if desired, store the data in a file or database. There are many ways to do this, and many languages you can build your web crawler or spider in.

Before Scrapy, developers have relied upon various software packages for this job using Python such as urllib2 and BeautifulSoup which are widely used. Scrapy is a new Python package that aims at easy, fast, and automated web crawling, which recently gained much popularity.

Scrapy is now widely requested by many employers, for both freelancing and in-house jobs, and that was one important reason for creating this Python Scrapy course, and that was one important reason for creating this Python Scrapy tutorial to help you enhance your skills and earn more income.

In this Scrapy tutorial, you will learn how to install Scrapy. You will also build a basic and advanced spider, and finally learn more about Scrapy architecture. Then you are going to learn about deploying spiders, logging into the websites with Scrapy. We will build a generic web crawler with Scrapy, and we will also integrate Splash and Selenium to work with Scrapy to iterate our pages. We will build an advanced spider with option to iterate our pages with Scrapy, and we will close it out using Close function with Scrapy, and then discuss Scrapy arguments. Finally, in this course, you will learn how to save the output to databases, MySQL and MongoDB. There is a dedicated section for diverse web scraping solved exercises... and updating.

One of the main advantages of Scrapy is that it is built on top of Twisted, an asynchronous networking framework. "Asynchronous" means that you do not have to wait for a request to finish before making another one; you can even achieve that with a high level of performance. Being implemented using a non-blocking (aka asynchronous) code for concurrency, Scrapy is really efficient.

It is worth noting that Scrapy tries not only to solve the content extraction (called scraping), but also the navigation to the relevant pages for the extraction (called crawling). To achieve that, a core concept in the framework is the Spider -- in practice, a Python object with a few special features, for which you write the code and the framework is responsible for triggering it.

Scrapy provides many of the functions required for downloading websites and other content on the internet, making the development process quicker and less programming-intensive. This Python Scrapy tutorial will teach you how to use Scrapy to build web crawlers and web spiders.

Scrapy is the most popular tool for web scraping and crawling written in Python. It is simple and powerful, with lots of features and possible extensions.

Python Scrapy Tutorial Topics:

This Scrapy course starts by covering the fundamentals of using Scrapy, and then concentrates on Scrapy advanced features of creating and automating web crawlers. The main topics of this Python Scrapy tutorial are as follows:

What Scrapy is, the differences between Scrapy and other Python-based web scraping libraries such as BeautifulSoup, LXML, Requests, and Selenium, and when it is better to use Scrapy.

This tutorial starts by how to create a Scrapy project and and then build a basic Spider to scrape data from a website.

Exploring XPath commands and how to use it with Scrapy to extract data.

Building a more advanced Scrapy spider to iterate multiple pages of a website and scrape data from each page.

Scrapy Architecture: the overall layout of a Scrapy project; what each field represents and how you can use them in your spider code.

Web Scraping best practices to avoid getting banned by the websites you are scraping.

In this Scrapy tutorial, you will also learn how to deploy a Scrapy web crawler to the Scrapy Cloud platform easily. Scrapy Cloud is a platform from Scrapinghub to run, automate, and manage your web crawlers in the cloud, without the need to set up your own servers.

This Scrapy tutorial also covers how to use Scrapy for web scraping authenticated (logged in) user sessions, i.e. on websites that require a username and password before displaying data.

This course concentrates mainly on how to create an advanced web crawler with Scrapy. We will cover using Scrapy CrawlSpider which is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules. We will also use Link Extractor object which defines how links will be extracted from each crawled page; it allows us to grab all the links on a page, no matter how many of them there are.

Furthermore there is a complete section in this Scrapy tutorial to show you how to combine Splash or Selenium with Scrapy to create web crawlers of dynamic web pages. When you cannot fetch data directly from the source, but you need to load the page, fill in a form, click somewhere, scroll down and so on, namely if you are trying to scrape data from a website that has a lot of AJAX calls and JavaScript execution to render webpages, it is good to use Splash or Selenium along with Scrapy.

We will also discuss more functions that Scrapy offers after the spider is done with web scraping, and how to edit and use Scrapy parameters.

As the main purpose of web scraping is to extract data, you will learn how to write the output to CSV, JSON, and XML files.

Finally, you will learn how to store the data extracted by Scrapy into MySQL and MongoDB databases.

Creating a web crawler in Scrapy
Crawling a single or multiple pages and scrape data
Deploying & Scheduling Spiders to ScrapingHub

Course Curriculum

Scrapy vs. Other Python Web Scraping Frameworks

2 Lectures

Scrapy vs. Beautiful Soup vs. Selenium

Course Tips (Must Read)

Scrapy Installation

5 Lectures

Linux Scrapy Installation

Mac Scrapy Installation

Windows Scrapy Installation

Scrapy Installation Instructions

Python Editor: Sublime Text

Building Basic Spider with Scrapy

3 Lectures

Scrapy Simple Spider - Part 1

Scrapy Simple Spider - Part 2

Scrapy Simple Spider - Part 3

XPath Syntax

2 Lectures

Using XPath with Scrapy

Tools to Easily Get XPath

Q&A

1 Lectures

Scrapy Basics

Do you have questions so far?

XPath Syntax

Building More Advanced Spider with Scrapy

5 Lectures

Scrapy Advanced Spider - Part 1

Scrapy Advanced Spider - Part 2

Scrapy Advanced Spider - Part 3

Scrapy Advanced Spider - Part 4

Scrapy Architecture

Web Scraping Best Practices

1 Lectures

Avoid Getting Banned!

Deploying & Scheduling Scrapy Spider on ScrapingHub

1 Lectures

ScrapingHub: Deploying & Scheduling Scrapy Spiders (UPDATED)

Logging into Websites Using Scrapy

1 Lectures

Logging into Websites Using Scrapy

Scrapy as a Standalone Script (UPDATED)

1 Lectures

Scrapy as a Standalone Script (UPDATED)

Building Web Crawler with Scrapy

1 Lectures

Building Web Crawler with Scrapy

Scrapy with Selenium

4 Lectures

Why/When We Should Use Selenium

Selenium WebDriver + Scrapy Selector to Extract URLs

Selenium Loading Next for Data Extraction (usable even with JavaScript pages)

Getting Data

Scrapy with Splash - JavaScript Websites

6 Lectures

Splash Prerequisite: Install Docker (NEW)

Splash Installation (NEW)

How to use Splash with Scrapy (NEW)

Splash Advanced Project: Scraping Baierl.com p.1 (NEW)

Splash Advanced Project: Scraping Baierl.com p.2 (NEW)

Splash Advanced Project: Scraping Baierl.com p.3 (NEW)

Scrapy Spider - Bookstore

2 Lectures

Grabbing URLs

Data Extraction

More about Scrapy

3 Lectures

Scrapy Arguments

Scrapy Close Function

Scrapy Items

Export Output to Files

4 Lectures

Scrapy Feed Exports to CSV, JSON, or XML

Export Output to Excel

Downloading Images with Scrapy Pipelines

Renaming Images with Scrapy Pipelines

Scrapy Project #1: Scraping Craigslist Eng Jobs in NY

8 Lectures

Craigslist Scraper - Overview

Creating Scrapy Craigslist Spider

Craigslist Scrapy Spider #1 – Titles

Craigslist Scrapy Spider #2 – One Page

Craigslist Scrapy Spider #3 – Multiple Pages

Craigslist Scrapy Spider #4 – Job Descriptions

Editing Scrapy settings.py (e.g. throttling, user agent, etc.)

Final Scrapy Tutorial, Craigslist Spider Code

Extracting Data to Databases - MySQL & MongoDB

6 Lectures

Installing MySQL

MySQL Installation and Usage

Writing Data to MySQL

Installing MongoDB

MongoDB Installation and Usage

Writing Data to MongoDB

Scrapy Project #2: Web Scraping Class-Central.com

2 Lectures

Scraping Class-Central - Part 1: Subjects (UPDATED)

Scraping Class-Central - Part 2: Courses (UPDATED)

Scrapy Advanced Topics

5 Lectures

Scrapy User Agent

Scraping Tables (UPDATED)

Scraping JSON Pages

Scrapy FormRequest (UPDATED)

Using Multiple Proxies with Crawlera (Optional)

Scrapy Project #3: Web Scraping Dynamic Website eplanning.ie

7 Lectures

ePlanning Scraping Project Overview

ePlanning: Extracting Initial URLs

ePlanning: Crawling Internal Pages

ePlanning: Scrapy Form Requests

ePlanning: Scraping Data

ePlanning: Checking Data Existence

ePlanning: Scraping Data from Table

Project #4: Scraping Shoes' Prices from API Request

3 Lectures

Scraping Product Prices from API Request p.1 (NEW)

Scraping Product Prices from API Request p.2 (NEW)

Scraping Product Prices from API Request p.3 (NEW)

Project #5: Web Scraping LinkedIn.com (UPDATED)

7 Lectures

LinkedIn Scraping Project: Overview & Requirements (UPDATED)

LinkedIn Logging in (UPDATED)

Finding LinkedIn Profiles: Part 1 (UPDATED)

Finding LinkedIn Profiles: Part 2 (UPDATED)

Scraping Data Points from LinkedIn Profiles: Part 1 (UPDATED)

Scraping Data Points from LinkedIn Profiles: Part 2 (UPDATED)

Connecting to LinkedIn Profiles (UPDATED)

Solved Web Scraping Exercises

3 Lectures

Yield Data Items from 2 Functions

How to Order Exported Data

Xpath contains() and starts-with() functions

Bonus: Data Extraction with APIs

1 Lectures

Data Extraction with APIs (Free Tutorial)

Bonus: Web Scraping with Beautiful Soup, Requests & Selenium Course

1 Lectures

Coupon for Web Scraping with Beautiful Soup, Requests & Selenium & Other Courses

Instructors

GoTrained Academy

4.9

315,475 Reviews
345 Students
34 Course

GoTrained is an e-learning academy aiming at creating useful content in different languages and it concentrates on technology and management. We adopt a special approach for selecting content we provide; we mainly focus on skills that are frequently requested by clients and jobs while there are only few videos that cover them. We also try to build video series to...

Instructors

4.9

315,475 Reviews
345 Students
34 Course

More Courses By Waqar Ahmed, GoTrained Academy, Faizan Ali

Web Scraping with Python: BeautifulSoup, Requests & Selenium

4.9

(230)

NLTK: Build Document Classifier & Spell Checker with Python

4.9

(230)

Chatbot Building with Rasa

4.9

(230)

Review

4.9 course rating

4K ratings

Mohammad Z.

5.0

11 months ago

Going good.... expecting more ahead !!

Helpful
Not helpful

Murat I. A.

1.0

1 year ago

unnecessarily fast and without sufficient explanation or teaching just proceeds, there are like thousands of youtube videos does the same and you can watch to grab some stuff out of it. videos are extremely fast pace its like hack and slash game but in coding lol..

Helpful
Not helpful

Joanna W.

3.0

2 years ago

I will say this. After I reached 40% of the course, codes did not work. I lost a lot of time trying to correct them. On the other hand, even if these codes do not work, I stll learned basic technologies used in web scraping. In the end I used puppeteer and javascript, but these classes were useful. It is just a pity that the codes did not work.

Helpful
Not helpful

Lourens S.

3.0

2 years ago

The course is good, but there are just too many assumptions about previous knowledge. He goes over many things without explaining what he is doing, which does not help for learning.

Helpful
Not helpful

Olusegun O.

5.0

2 years ago

Great experience except that the craiglist website has been updated and it's hard now to replicate the tutor's results.

Helpful
Not helpful

Kenneth K.

5.0

3 years ago

This is one of the best courses I've come across. Well laid out and purposefully developed to make you learn faster. Thank you, Lazar.

Helpful
Not helpful

Andrzej K.

4.0

3 years ago

Good introduction to Scrapy and the topic of web scraping in general. A downside is that it's a little repetitive with the new and old content covering the same steps.

Helpful
Not helpful

Anish S.

2.0

3 years ago

The course is fine but they claim that you can post any pages you are having difficulty scraping in the questions and they will review if the same has not been covered but they don't respond to the questions

Helpful
Not helpful

David M.

5.0

3 years ago

Very clear and easy to follow, also Lazar is very responsive with Q&As. The only course you'll need to pick up webscraping!

Helpful
Not helpful

Evan S.

4.0

3 years ago

It'd be good to see the python 2 code updated to python 3.

I obviously don't have expertise in the scrapy framework, but I'd wager there are more preferred ways to organize code for large projects - I'd have wanted to see some of those best practices too

I wish the volume levels were the same (or more similar) across videos

Helpful
Not helpful

Ratings

This course includes:

54.5 hours on-demand video
3 articles
249 downloadable resources
Access on mobile and TV
Full lifetime access
Certificate of completion

Courses You May Like

Lorem ipsum dolor sit amet elit

Show More Courses

Become a Certified Web Developer: HTML, CSS and JavaScript

4.9

(230)

By: Carolyn Welborn

Scrapy: Powerful Web Scraping & Crawling with Python

About This Course

Course Curriculum