Website blocks my requests from linux ubuntu server

Question

I'm a Java engineer with zero dev ops experience. Lately I was playing around with linux ubuntu server first time and used docker with my selenium project and faced this problem:

I try to scrape HTML from a website, but my calls are getting blocked, and I get 403 forbidden response. I tried to curl same website and also get same response.

Furthermore, I only get blocked in my Linux machine, everything works in local dev env with same docker image, so thats why I think its "server fault".

Any ideas what my Linux server is missing here? Maybe I don't have some sort of certificate or have cors problem? Any ideas, what can I try? (For learning purposes only)

curl call here

Pass the web browser and your curl and Java apps through a proxy like mitmproxy and check the request, especially the headers. I am sure will will see the differences that cause the web server to send different responses. — Robert
– Robert, Commented Jan 31, 2022 at 20:31
Not really on topic for ServerFault, getting selenium and curl commands to work is more StackOverflow. But most likely: the site tries to detect scrapers and uses mechanisms like cookies and sessions to identify real interactive users/browsers. — Bob
– Bob, Commented Jan 31, 2022 at 20:36
@Bob I would say it's ServerFault, because it works with my local machine with same docker image. — Vytautas Šerėnas
– Vytautas Šerėnas, Commented Feb 1, 2022 at 6:28
@Robert appreciate your suggestion, I'm going to investigate and update this question. — Vytautas Šerėnas
– Vytautas Šerėnas, Commented Feb 1, 2022 at 6:30
Just being the servers fault doesn't make it on topic for ServerFault. If this is your server you are trying to scrape, provide your server configuration and log files and we can try to help you. If this is not your server, it's off topic here. And in that case, I'd stop doing what you are doing. Now you are just getting a 403, the next notice might be from a lawyer. — Gerald Schneider
– Gerald Schneider, Commented Feb 4, 2022 at 9:14

shearn89 · Accepted Answer · 2022-02-03 08:47:59Z

1

I believe you're getting rate-limited or blocked by the website. If I run the same curl command from my laptop, I get the webpage back.

Remember to respect robots.txt if you're doing web scraping.

answered Feb 3, 2022 at 8:47

shearn89

3,6202 gold badges20 silver badges40 bronze badges

Did not know about robots.txt, great findings, thanks. I had no idea about rate-limiting, but I think it's not the case, because from the start after deploy first call was blocked.

Vytautas Šerėnas
– Vytautas Šerėnas

2022-02-04 09:16:06 +00:00
Commented Feb 4, 2022 at 9:16

Add a comment |

Stack Exchange Network

Website blocks my requests from linux ubuntu server

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Website blocks my requests from linux ubuntu server

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions