This document provides a comprehensive overview of message queues, highlighting their role in decoupling sender and receiver processes within web applications. It discusses the functionality, scenarios for use, implementation with RabbitMQ and Celery in Python, and practical coding examples for task management. Additionally, it covers best practices for configuration and task routing to enhance application performance.
Life in aQueue Tareque Hossain Education Technology
2.
What is MessageQueue? • Message Queues are: o Communication Buffers o Between independent sender & receiver processes o Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
4.
Inside a MessageQueue Web App Server Dequeue Manager Worker Server Web App T1 T3 Server T2 T4 T6 Worker Server T5 Web App T7 Server Q1 Q2 Enqueue Worker Server Manager Web App Server Message Queue Broker
5.
How does itwork? • Say a web application server has a task it doesn’t have time to do • It puts the task in the message queue • Other web servers can access the same queue(s) and put tasks there • Queues are FIFO (First In First Out) • Workers are greedy and they all watch the queues for tasks • Workers asynchronously pick up the first available task on the queue when they are ready
6.
Do I needMessage Queues? • Message Queues are useful in certain situations • General guidelines: o Does your web applications take more than a few seconds to generate a response? o Are you using a lot of cron jobs to process data in the background? o Do you wish you could distribute the processing of the data generated by your application among many servers?
7.
Wait I’ve heardAsynchronous before! • Yes. AJAX is an asynchronous communication method between client & server • Some of the response time issues can be solved: o With AJAX responses that continually enhance the initial response o Only if the AJAX responses also complete within a reasonable amount of time • You need Message Queues when: o Long processing times can’t be avoided in generating responses o You want application data to be continuously processed in the background and readily available when requested
8.
MQ Tasks: ProcessingUser Uploads • Resize uploaded image to generate different resolutions of images, avatars, gallery snapshots • Reformat videos to match your player requirements • YouTube, Facebook, Slideshare are good examples
9.
MQ Tasks: GenerateReports • Generating reports from large amount of data o Reports that contains graphical charts o Multiple reports that cross reference each other
10.
MQ Tasks: 3rdParty Integrations • Bulk processing of 3rd party service requests o Refund hundreds of transactions using Paypal o Any kind of data synchronization o Aggregation of RSS/other feeds Social Network Feed Aggregator
11.
MQ Tasks: CronJobs • Any cron job that is not time sensitive o Asynchronous behavior of message queue doesn’t guarantee execution of tasks on the dot o Jobs in cron that should be done as soon as resources become available are good candidates
OMG That’s toomuch! • Yeah. I agree. • Read great research details at Second Life dev site o http://wiki.secondlife.com/wiki/Message_Queue_Evaluation_Notes • Let’s simplify. How do we choose? o How is the exception handling and recovery? o Is maintenance relatively low? o How easy is deployment? o Are the queues persistent? o How is the community support? o What language is it written in? How compatible is that with our current systems? o How detailed are the documentations?
15.
Choice of PBSEducation • We chose AMQP & RabbitMQ • Why? o We don’t expect message volumes as high as 1M or more at a time o RabbitMQ is free to use o The documentation is decent o There is decent clustering support, even though we never needed clustering o We didn’t want to lose queues or messages upon broker crash/ restart o We develop applications using Python/django and setting up an AMQP backend using celery/kombu was easy
16.
Message Queue SolutionStack RabbitMQ PyAMQPlib/Kombu PyAMQPlib/Kombu Celery Celery Web Application Server Queue Worker
17.
Celery? Kombu? Yummy. • django made web development using Python a piece of cake • Celery & Kombu make using message queue in your django/Python applications a piece of cake • Kombu o AMQP based Messaging Framework for Python, powered by PyAMQPlib o Provides fundamentals for creating queues, configuring broker, sending receiving messages • Celery o Distributed task queue management application
18.
Celery Backends • Celeryis very, very powerful • You can use celery to emulate message queue brokers using a DB backend for broker o Involves polling & less efficient than AMQP o Use for local development • Bundled broker backends o amqplib, pika, redis, beanstalk, sqlalchemy, django, mongodb, couchdb • Broker backend is different that task & task result store backend o Used by celery to store results of a task, errors if failed
19.
A Problem witha View • What is wrong with this view? def create_report(request): ... Code for extracting parameters from request ... ... Code for generating report from lots of data ... return render_to_response(‘profiles/ index.html’, { ‘report’: report, }, context_instance=RequestContext(request))
Lets Write aCelery Task • Writing celery tasks was never any more difficult than this: import celery @celery.task() def generate_report(*args, **kwargs): ... Code for generating report ... report.save()
22.
Lets Write aCelery Task II • If you want to customize your tasks, inherit from the base Task object from celery.task.base import Task class GenerateReport(Task): def __init__(self, *args, **kwargs): ... Custom init code ... return super(GenerateReport, self).__init__(*args, **kwargs) def run(self, *args, **kwargs): ... Code for generating report ... report.save()
23.
Issuing a task • After writing a task, we issue the task from within a request in the following way: def create_report(request): ... Code for extracting parameters from request ... generate_report.delay(**params) // or GenerateReport.delay(**params) messages.success(request, 'You will receive an email when report generation is complete.') return HTTPResponseRedirect(reverse (‘reports_index’))
24.
What happens whenyou issue tasks? Broker Queue Celery Celery Celery Celery Application Request Server Handler Worker Worker Worker
25.
Understanding Queue Routing • Brokers contains multiple virtual hosts • Each virtual host contains multiple exchanges • Messages are sent to exchanges o Exchanges are hubs that connect to a set of queues • An exchange routes messages to one or more queues Queue Exchange VHost
26.
Understanding Queue Routing • In Celery configurations: o binding_key binds a task namespace to a queue o exchange defines the name of an exchange o routing_key defines which queue a message should be directed to under a certain exchange o exchange_type = ‘direct’ routes for exact routing keys o exchange_type = ‘topic’ routes for namespaced & wildcard routing keys • * (matches a single word) • # (matches zero or more words)
Quick Tips # Route a task mytask.apply_async( args=[filename], routing_key=“video.compress” ) # Or define task mapping in CELERY_ROUTES setting # Set expiration for a task – in seconds mytask.apply_async(args=[10, 10], expires=60) # Revoke a task using the task instance result = mytask.apply_async(args=[2, 2], countdown=120) result.revoke() # Or save the task ID (result.task_id) somewhere from celery.task.control import revoke revoke(task_id)
29.
Quick Tips • Executetask as a blocking call using: generate_report.apply(kwargs=params, **options) • Avoid issuing tasks inside an asynchronous task that waits on children data (blocking) o Write re-usable pieces of code that can be called as functions instead of called as tasks o If necessary, use the callback + subtask feature of celery • Ignore results if you don’t need them o If your asynchronous task doesn’t return anything @celery.task(ignore_results=True)
30.
Good to know • Do check whether your task parameters are serializable o WSGI request objects are not serializable o Don’t pass request as a parameter for your task • Don’t pass unnecessary data in task parameters o They have to be stored until task is complete
31.
Good to know • Avoid starvation of tasks using multiple queues o If really long video re-formatting tasks are processed in the same queue as relatively quicker thumbnail generation tasks, the latter may starve o Only available when using AMQP broker backend • Use celerybeat for time sensitive repeated tasks o Can replace time sensitive cron jobs related to your web application
32.
Q&A • Slides availableat: o http://www.slideshare.net/tarequeh • Extensive guides & documentation available at: o http://ask.github.com/celery/