Python Forum
Twitter scraping exclude some data - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: Twitter scraping exclude some data (/thread-4599.html)



Twitter scraping exclude some data - Robbert - Aug-29-2017

Hello everyone! Im new here and im also new to python. Im eager to learn python because the possibilities are immense. Currently im working on a twitter streaming code, which I pasted in the code section below.
Im wondering how I should exclude data from the streamer?
1. For instance, i want to check wether the 'status' or 'location' fields are not null.
2. I would like to exclude some fields. For instance, 'retweets'.

If someone could explain how I'm supposed to program [1] en [2] then I would be very happy :)

from tweepy import Stream from tweepy import OAuthHandler from tweepy.streaming import StreamListener import json # consumer key, consumer secret, access token, access secret. consumer_key = "xxx" consumer_secret = "xxx" access_token = "xxxx" access_token_secret = "xxxx" class StdOutlistener(StreamListener): def on_data(self, data): json_data = json.loads(data) print (json_data) # Open json text file to save the tweets with open('tweets.json', 'a') as tf: tf.write(data) return True def on_error(self, status): print(status) auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_token_secret) twitterStream = Stream(auth, StdOutlistener()) twitterStream.filter(track=["Test"])



RE: Twitter scraping exclude some data - Robbert - Aug-31-2017

Can anyone help me?
Is my question unclear>?


RE: Twitter scraping exclude some data - nilamo - Aug-31-2017

Are those fields part of the json response you're receiving?


RE: Twitter scraping exclude some data - Robbert - Aug-31-2017

(Aug-31-2017, 05:33 PM)nilamo Wrote: Are those fields part of the json response you're receiving?

Yes. The json format contains all the data that is available.
In this tutorial: http://adilmoujahid.com/posts/2014/07/twitter-analytics/ an overview is given of the data and Json output when no filters are applied. Literally everything is passing through and I would like to know whether it is possible to skip some fields. For instance; im not interesse in the fact that someone does have 20 followers or something.


RE: Twitter scraping exclude some data - nilamo - Aug-31-2017

Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30

You're currently using on_data(), which fires off for every single message.  Have you tried using one of the more specific ones, like on_status()?


RE: Twitter scraping exclude some data - Robbert - Sep-02-2017

(Aug-31-2017, 08:55 PM)nilamo Wrote: Here's a direct link to the StreamListener class from the tweepy module: https://github.com/tweepy/tweepy/blob/v3.5.0/tweepy/streaming.py#L30

You're currently using on_data(), which fires off for every single message.  Have you tried using one of the more specific ones, like on_status()?

Thanks for your reply and suggestion. I will have at the webpage you mentioned.
No, i haven't tried on_status which would probably be better. But i have no idea how to use on_status in this particular script.

Do you perhaps have a link for that to?


RE: Twitter scraping exclude some data - nilamo - Sep-02-2017

You currently use on_data.  replace the word "data" with "status", and it should run.


This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.