Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Celery is one of those things in Python that you can't (sometimes unfortunately) live without. Earlier versions of Celery have some difficult bugs and inconsistencies that made it feel like a very tough tool to work with and required a lot of developer diligence and operational experience to make sure your tool didn't break. Things like message memory explosion (multi-pass deserialization), poor defaults, difficulties debugging and tracking exceptions, poor monitoring tools like 'flower' etc. all lead to this. Problems were exacerbated by the fact that simple async operations in python (which are easily fixed in more concurrent languages with a simple go func{}()) end up requiring a heavy distributed solution like Celery (or the lighter RQWorker) which creates a whole host of issues.

I imagine as native async tooling improves in py3x (async/await, aiohttp, and other tools) use of celery to do trivially concurrent things will decrease and Celery's usage will focus on more complex workflows (chords, fanouts, map/reduce).

Looks like many concerns we're tackled here (thanks Celery team) and I'm looking forward to playing around with this release.



I've managed to live without it. For low and medium traffic sites it's hugely over-engineered. For all the sites I manage, I run a single cron task that triggers a range of background jobs. It's an approach worked very well for nearly 10 years.

I understand there's a genuine use-case for Celery but like many technologies people are told "if you need a task queue use this" when there are much simpler solutions that are more than good enough for most.


Celery was never meant as a replacement for cron, it was simply a nice bonus that fits the messaging pattern well. Writing a task queue is actually very simple using for example Redis, but that doesn't necessarily mean Celery is over-engineered IMHO. It's very easy to forget the support required once your system is in production.

Disclaimer: I'm a contributor


By "contributor" you mean you are the main author, right? XD

Joking aside, we use Celery in production for asynchronous number crunching. We build a web UI in Django that raises Celery tasks that run long number crunching jobs. When the number crunching is done we look at the report through our Django web UI.

It works well enough, though we wish there were better built-in task management. E.g. A built-in API to reshuffle tasks (e.g. for I/O resource balancing) would be nice. celery purge also seems too drastic in purging every task in every queue. Or maybe we're just doing it wrong.


I'm not saying Celery is over-engineered in general. It's just over-engineered in the context I've often seen it recommended in. i.e. for people learning or people with fairly modest requirements.


Yep, I first asked on Stack overflow the best way to achieve a background task in a Django app, and the answers all said Celery. Considering it got run around once a week, it was a very over engineered solution I ended up with.

Its useful to know Celery, and gets used in a proper context in my current work, so I guess learning it wasn't a waste.


I think the advantage with tools like celery is that it deals with the many different failure scenarios pretty well.

It's kind of like jQuery's ajax. Of course you can figure out how to make a simple replacement. But then you have to manage the 100 different edge cases for when your code doesn't go down the happy path.

Much easier to write a simple task queue that's good enough than a replacement for $.ajax, though...


Well said


For low and medium traffic sites, the simplest usage patterns are really easy: http://docs.celeryproject.org/en/latest/getting-started/intr...

The best thing about celery is that you don't have to use any of the advanced features. If you need the basics, stick to the basics.

On the flipside, if and when you do need more than the basics, your system will grow with you. No need to hack up cron monstrosities as you grow beyond a single server. Though, cron vs celery is kind of an apples to... celery comparison.


Even if the getting started is simple, there's still a ton of code and dependencies you're bringing into your project. Most importantly you've suddenly got dependencies on persistent processes such as Redis and RabbitMQ which need to be installed system wide. That's something that needs to be factored in to deploys, configs, restarts etc. You now need new tutorials for every deploy method (Heroku, Webfaction, PythonAnywhere and other outliers).

Yeah - everyone should ideally be comfortable with all this stuff but I try and keep anything that's more complex than a pip install in a virtualenv to an absolute minimum.


When building a small web app a few weeks ago, I tried to avoid additional dependencies, too. I was a bit stuck when I need to send an email asynchronous. Do you have a _simple_ recommendation to solve this without pulling something huge like celery+redis in to solve this task?


Depends on what you need. The bare minimum is to just spawn process so the request can return and your email can send. You tend to lack much control of failure conditions that way so you'd want to have some code in in the normal request/response cycle that checks for success or failure and informs the user.

You could add your emails to a db table and have a cron job consume them.

But for sending an occasional email? I've never really had a problem with just connecting to the SMTP server in the request/response cycle. If it takes more than a second to send then something is seriously wrong. You could use Mailgun or similar services which have their own queue for handling bulk sends and further reduce the likelihood of a problematic blocking of the web-server.


I'm in a corporate network where sending the email over our internal smtp takes about 5 seconds. I dont want the user to wait for this delay, so I'm using celery currently. It just felt a bit too much for such a little task.

Maybe I will try something simpler like http://stackoverflow.com/a/4447147


RQ + Reddis is "easier" http://python-rq.org/

If you are on AWS, you could just use something like their email service to fire off the request, then you don't need a queue at all.


For little things, it doesn't matter much what you do, and for bigger things I've moved to Luigi for background tasks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: