PHP's Guzzle is a popular HTTP client used when web scraping with PHP
and proxies are an integral part of web scraping so here's a quick introduction on how to use proxies with Guzzle:
<?php require 'vendor/autoload.php'; use GuzzleHttp\Client; // Proxy pattern is: // scheme://username:password@IP:PORT // For example: // no auth HTTP proxy: $my_proxy = "http://160.11.12.13:1020"; // proxy with authentication $my_proxy = "http://my_username:my_password@160.11.12.13:1020"; // Note: that username and password should be url encoded if they contain URL sensitive characters like "@": $my_proxy = 'http://'.urlencode('foo@bar.com').':'.urlencode('password@123').'@160.11.12.13:1020'; $client = new Client([ // Base URI is used with relative requests 'base_uri' => 'https://httpbin.dev', // You can set any number of default request options. 'timeout' => 2.0, 'proxy' => [ 'http' => $my_proxy, // This proxy will be applied to all 'http' URLs 'https' => $my_proxy, // This proxy will be applied to all 'https' URLs 'https://httpbin.dev' => $my_proxy, // This proxy will be applied only to 'https://httpbin.dev' ] ]); $response = $client->request('GET', '/ip'); $body = $response->getBody(); print($body);
Guzzle does not support SOCKS proxies and the only available options are php's curl library or buzz.
Note that Guzzle proxy can also be set through the standard *_PROXY
environment variables:
$ export HTTP_PROXY="http://160.11.12.13:1020" $ export HTTPS_PROXY="http://160.11.12.13:1020" $ export ALL_PROXY="socks://160.11.12.13:1020"
When web scraping, it's best to rotate proxies for each request. For that see our article: How to Rotate Proxies in Web Scraping