Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Python script to monitor website changes
In today's digital age, staying up to date with the latest changes on a website is crucial for various purposes, such as tracking updates on a competitor's site, monitoring product availability, or staying informed about important information. Manually checking websites for changes can be time-consuming and inefficient. That's where automation comes in.
In this tutorial, we will explore how to create a Python script to monitor website changes. By leveraging the power of Python and some handy libraries, we can automate the process of retrieving website content, comparing it with previous versions, and notifying us of any changes.
Setting up the Environment
Before we begin writing the script to monitor website changes, we need to set up our Python environment and install the necessary libraries. Follow the steps below to get started ?
Install Python ? If you haven't already, download and install Python on your system. You can visit the official Python website (https://www.python.org/) and download the latest version compatible with your operating system.
Create a New Python Virtual Environment (optional) ? It's recommended to create a virtual environment for this project to keep the dependencies isolated.
python -m venv website-monitor-env
This will create a new virtual environment named "website-monitor-env" in your project directory.
Activate the Virtual Environment ? Activate the virtual environment by running the appropriate command based on your operating system:
For Windows ?
website-monitor-env\Scripts\activate.bat
For macOS/Linux ?
source website-monitor-env/bin/activate
Install Required Libraries ? With the virtual environment activated, install the necessary libraries:
pip install requests beautifulsoup4
The "requests" library will help us retrieve website content, while "beautifulsoup4" will assist in parsing HTML.
Retrieving Website Content
To monitor website changes, we need to retrieve the current content of the website and compare it with the previously saved version. Here's how to fetch website content using the "requests" library ?
Basic Website Content Retrieval
import requests
from bs4 import BeautifulSoup
# Specify the website URL
url = "https://httpbin.org/html"
# Send a GET request and retrieve the content
response = requests.get(url)
# Check the response status
if response.status_code == 200:
print("Successfully retrieved website content")
print("Content length:", len(response.text))
print("First 200 characters:")
print(response.text[:200])
else:
print("Failed to retrieve website content. Status code:", response.status_code)
Successfully retrieved website content
Content length: 3741
First 200 characters:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<h1>Herman Melville - Moby-Dick</h1>
<div>
<p>
Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in prepar
Saving and Comparing Website Content
Once we have retrieved the website content, we need to save it for future comparison. Here's a complete example that demonstrates saving, loading, and comparing website content ?
import requests
import os
def check_website_changes(url, filename="website_content.txt"):
try:
# Send GET request to the website
response = requests.get(url)
if response.status_code != 200:
print(f"Failed to retrieve content. Status code: {response.status_code}")
return
current_content = response.text
# Check if previous content exists
if os.path.exists(filename):
with open(filename, "r", encoding="utf-8") as file:
previous_content = file.read()
# Compare current content with previous content
if current_content == previous_content:
print("No changes detected.")
else:
print("Website content has changed!")
print(f"Previous content length: {len(previous_content)}")
print(f"Current content length: {len(current_content)}")
# Update the saved content with new version
with open(filename, "w", encoding="utf-8") as file:
file.write(current_content)
print("Content updated successfully.")
else:
# First time running - save initial content
with open(filename, "w", encoding="utf-8") as file:
file.write(current_content)
print("Initial content saved successfully.")
except requests.RequestException as e:
print(f"Error fetching website: {e}")
except IOError as e:
print(f"Error with file operations: {e}")
# Test the function
url = "https://httpbin.org/html"
check_website_changes(url)
Initial content saved successfully.
Complete Website Monitor Script
Here's a complete automated website monitoring script that runs continuously ?
import requests
import time
import os
from datetime import datetime
class WebsiteMonitor:
def __init__(self, url, filename="website_content.txt", interval=300):
self.url = url
self.filename = filename
self.interval = interval # seconds
def get_current_content(self):
"""Fetch current website content"""
try:
response = requests.get(self.url, timeout=10)
if response.status_code == 200:
return response.text
else:
print(f"HTTP Error: {response.status_code}")
return None
except requests.RequestException as e:
print(f"Request failed: {e}")
return None
def load_previous_content(self):
"""Load previously saved content"""
if os.path.exists(self.filename):
try:
with open(self.filename, "r", encoding="utf-8") as file:
return file.read()
except IOError as e:
print(f"Error reading file: {e}")
return None
def save_content(self, content):
"""Save content to file"""
try:
with open(self.filename, "w", encoding="utf-8") as file:
file.write(content)
return True
except IOError as e:
print(f"Error saving file: {e}")
return False
def detect_changes(self):
"""Check for website changes"""
current_content = self.get_current_content()
if current_content is None:
return False
previous_content = self.load_previous_content()
if previous_content is None:
# First time - save initial content
if self.save_content(current_content):
print(f"[{datetime.now()}] Initial content saved.")
return False
if current_content != previous_content:
print(f"[{datetime.now()}] CHANGE DETECTED!")
print(f"Previous length: {len(previous_content)}")
print(f"Current length: {len(current_content)}")
# Save new content
if self.save_content(current_content):
print("New content saved.")
# Here you can add notification logic
# self.send_notification()
return True
else:
print(f"[{datetime.now()}] No changes detected.")
return False
def start_monitoring(self):
"""Start continuous monitoring"""
print(f"Starting website monitoring for: {self.url}")
print(f"Check interval: {self.interval} seconds")
print("Press Ctrl+C to stop monitoring\n")
try:
while True:
self.detect_changes()
time.sleep(self.interval)
except KeyboardInterrupt:
print("\nMonitoring stopped by user.")
# Usage example
if __name__ == "__main__":
website_url = "https://httpbin.org/html"
monitor = WebsiteMonitor(website_url, interval=60) # Check every minute
monitor.start_monitoring()
Advanced Features
You can enhance the basic monitor with additional features ?
Monitoring Specific Content
import requests
from bs4 import BeautifulSoup
def monitor_specific_content(url, css_selector):
"""Monitor specific elements using CSS selectors"""
try:
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
# Extract specific content using CSS selector
elements = soup.select(css_selector)
content = [elem.get_text(strip=True) for elem in elements]
print("Monitored content:")
for i, text in enumerate(content, 1):
print(f"{i}. {text}")
return content
else:
print(f"Error: {response.status_code}")
return None
except Exception as e:
print(f"Error: {e}")
return None
# Example: Monitor all paragraph text
url = "https://httpbin.org/html"
content = monitor_specific_content(url, "p")
Monitored content: 1. Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, after concluding his contributory work for Ahab's leg, but still retained it on deck, fast lashed to ringbolts by the foremast; being now almost incessantly invoked by the headsmen, and harpooneers, and bowsmen to do some little job for them; altering, or repairing, or new shaping their various weapons and boat furniture. Often he would be surrounded by an eager circle, all waiting to be served; holding boat-spades, pike-heads, harpoons, and lances, and jealously watching his every sooty movement, as he toiled. Nevertheless, this old man's was a patient hammer wielded by a patient arm. No murmur, no impatience, no petulance did come from him. Silent, slow, and solemn; bowing over still further his chronically broken back, he toiled away, as if toil were life itself, and the heavy beating of his hammer the heavy beating of his heart. And so it was.?Most miserable! A peculiar walk in this old man, a certain slight but painful appearing yawing in his gait, had at an early period of the voyage excited the curiosity of the mariners. And to the importunity of their persisted questionings he had finally given in; and so it came to pass that every one now knew the shameful story of his wretched fate. Belated, and not innocently, one bitter winter's midnight, on the road running between two country towns, the blacksmith half-stupidly felt the deadly numbness stealing over him, and sought refuge in a leaning, dilapidated barn. The issue was, the loss of the extremities of both feet. Out of this revelation, part by part, at last came out the four acts of the gladness, and the one long, and as yet uncatastrophied fifth act of the grief of his life's drama. He was an old man, who, at the age of nearly sixty, had postponedly encountered that thing in sorrow's technicals called ruin. He had been an artisan of famed excellence, and with plenty to do; owned a house and garden; embraced a youthful, daughter-like, loving wife, and three blithe, ruddy children; every Sunday went to a cheerful-looking church, planted in a grove. But one night, under cover of darkness, and further concealed in a most cunning disguisement, a desperate burglar slid into his happy home, and robbed them all of everything. And darker yet to tell, the blacksmith himself did ignorantly conduct this burglar into his family's heart. It was the Bottle Conjuror! Upon the opening of that fatal cork, forth flew the fiend, and shrivelled up his home. Now, for prudent, most wise, and economic reasons, the blacksmith's shop was in the basement of his dwelling, but with a separate entrance to it; so that always had the young and loving healthy wife listened with no unhappy nervousness, but with vigorous pleasure, to the stout ringing of her young-armed old husband's hammer; whose reverberations, muffled by passing through the floors and walls, came up to her, not unsweetly, in her nursery; and so, to stout Labor's iron lullaby, the blacksmith's infants were rocked to slumber. Oh, woe on woe! Oh, Death, why canst thou not sometimes be timely? Hadst thou taken this old blacksmith to thyself ere his full ruin came upon him, then had the young widow had a delicious grief, and her orphans a truly venerable, legendary sire to dream of in their after years; and all of them a care-killing competency.
Conclusion
Website monitoring with Python provides an automated solution for tracking changes across web content. The script combines HTTP requests, file I/O operations, and scheduling to create a robust monitoring system that can detect modifications in real-time. This approach is valuable for competitive analysis, content tracking, and staying informed about important website updates.
