# 如何用Python进行静态爬虫及地址经纬度转换 ## 一、静态网页爬虫基础 静态网页爬虫是指从无需JavaScript渲染的HTML页面中直接提取数据的技术。Python凭借丰富的库成为爬虫开发的首选语言。 ### 1.1 核心工具库 ```python import requests # 网络请求 from bs4 import BeautifulSoup # HTML解析 import pandas as pd # 数据存储
url = "https://example.com" response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 示例:提取所有链接 links = [a['href'] for a in soup.find_all('a', href=True)]
以下示例演示如何从静态页面获取地址信息:
def scrape_addresses(): url = "http://www.address-source.com/cities" headers = {'User-Agent': 'Mozilla/5.0'} try: response = requests.get(url, headers=headers) response.raise_for_status() soup = BeautifulSoup(response.content, 'lxml') addresses = [] for item in soup.select('.address-item'): addr = { 'city': item.find('h2').text.strip(), 'street': item.find('span', class_='street').text, 'zipcode': item.find('span', class_='zip').text } addresses.append(addr) return pd.DataFrame(addresses) except Exception as e: print(f"爬取失败: {e}") return None
推荐使用以下服务: - 高德地图API(国内推荐) - Google Maps Geocoding API - 百度地图API
import hashlib def gaode_geocode(address, api_key): base_url = "https://restapi.amap.com/v3/geocode/geo" params = { 'address': address, 'key': api_key, 'output': 'JSON' } response = requests.get(base_url, params=params) data = response.json() if data['status'] == '1' and data['geocodes']: location = data['geocodes'][0]['location'] lng, lat = location.split(',') return float(lng), float(lat) return None
def batch_geocode(df, api_key): results = [] for addr in df['full_address']: coords = gaode_geocode(addr, api_key) results.append({ 'address': addr, 'longitude': coords[0] if coords else None, 'latitude': coords[1] if coords else None }) return pd.DataFrame(results)
# 步骤1:爬取地址数据 address_df = scrape_addresses() # 步骤2:拼接完整地址 address_df['full_address'] = (address_df['city'] + address_df['street'] + address_df['zipcode']) # 步骤3:地理编码转换 api_key = "your_amap_api_key" # 需提前申请 geo_df = batch_geocode(address_df, api_key) # 步骤4:保存结果 geo_df.to_csv('address_with_coordinates.csv', index=False)
import time time.sleep(1) # 每次请求间隔1秒
import folium def create_map(geo_df): m = folium.Map(location=[geo_df['latitude'].mean(), geo_df['longitude'].mean()], zoom_start=12) for _, row in geo_df.iterrows(): folium.Marker([row['latitude'], row['longitude']], popup=row['address']).add_to(m) return m
通过以上方法,您可以高效实现地址信息的采集与地理坐标转换,为后续的空间分析奠定数据基础。 “`
(注:实际使用时需替换示例中的网址和API密钥,并确保遵守相关网站的使用条款)
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。