Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preventing duplicate entries in Queue. Adding only 1 job to a queue. #102

Open
epicwhale opened this issue Jul 2, 2014 · 18 comments
Open

Comments

@epicwhale
Copy link
Contributor

This is more of a query than issue.

If I have a Products queue with a list of products that need to be updated using data from a remote source, I want to ensure that no two products get updated at the sametime as it causes some concurrency issues.

What are the recommended solutions to approach this problem?

  1. Make sure my Jobs are concurrency proof (using database locks, etc).. but difficult to achieve with NoSQL.
  2. Ensure that the same Job is not being processed in parallel. (How can I achieve this?)
  3. Ensure that there are no duplicate jobs in the queue. (How can I achieve this?)

Any other solutions, would be appreciated.

@danhunsaker
Copy link

Option 2 - Set up a dedicated queue for sequential jobs, and assign only one worker to it. Assign all jobs that need to be done sequentially to that queue instead of the default(s). Mission accomplished.

@epicwhale
Copy link
Contributor Author

@danhunsaker I forgot to mention this earlier, but my complication here is that our use-case is that of an e-commerce platform that is syncing products for multiple store's product queues. So new stores can be added automatically and each new store should have its own product queue...

But this has to happen dynamically as we can't re-configure running workers, queue names, etc in supervisor, linux each time we add a store.

@danhunsaker
Copy link

Does each store need a unique sync queue, or can the application operate with a single sync queue and a separate work queue for each store for other operations?

@epicwhale
Copy link
Contributor Author

@danhunsaker each store doesn't need a unique sync queue as of now for anything except for products syncing. Everything else can be pooled into a common 'default' queue.

I need to make sure that one store's product sync is not delayed because another store having too many items in the queue. Would create a Quality of Service issue. Hence, 1 queue per store for product sync.

Do share your thoughts!

@danhunsaker
Copy link

My first response to that is virtualization. Each new store spins up a new VM, and anything it needs to do separately from other stores is done there. If I were implementing it, I'd have each store's web interface in its own VM, and possibly separate workers out as well. You'd still have a common worker pool in its own VM, and all the workers would connect to a single Redis instance.

However, that's often beyond the available technology, and it's probably too late in your dev cycle to set things up that way anyhow. So you'll want another approach. Ruby Redis has a plugin that allows certain queues to be marked as sequential when they're created, and then ensures that jobs are not removed from those queues while other workers are processing on them. I haven't looked at its code to see how portable it would be to how PHP Resque operates, but it's a starting point, I think.

@epicwhale
Copy link
Contributor Author

@danhunsaker that does seem like an overkill.. especially since I'm building a SaaS service and want this to scale to a a few 100 and then thousand customers.

I did see the Ruby stuff on serialization in a queue and lock-in.. but looks a tad bit of complication to replicate and manage.. http://www.bignerdranch.com/blog/never-use-resque-for-serial-jobs/ (its almost maintaining another stand-alone project within my project). Don't have the benefit of time there.

Maybe for the products queue, I should be exploring some other alternative? Do you know of any other background or MQ solution that could support this and have a good bundle/library for php/sf2?

@mrbase
Copy link
Contributor

mrbase commented Jul 2, 2014

@epicwhale i'm currently looking at gearman - which has a bundle - and it's under active development, needs sf 2.4 tho

if it meets your requirements i don't know, but its simple, fast and scales

otherwise, look at http://queues.io/ - here is a fine collection of queue systems

@mrbase
Copy link
Contributor

mrbase commented Jul 2, 2014

and there is a pecl extension for php as well: http://www.php.net/manual/en/book.gearman.php

@danhunsaker
Copy link

In my experience, virtualization scales better, and is more secure to boot. But my experience varies wildly from that of many others, who haven't had any problem using such platforms as cPanel and WordPress for all of their needs. I just got tired of one site being able to consume the full resources of my servers, with no reliable way to restrict their activities without affecting anyone else. Also got tired of one hacked site infecting everything on the server. As with anything, your mileage will vary.

Resque wasn't really designed for sequential operation, and making it do it anyway will always be a hack. Even scheduled tasks are a hack, really. So PHP-Resque may not be your best fit. As to alternatives, there are many, and @mrbase has presented some useful starting points. I can't speak to Symphony interop, because I don't use Symfony. To me, SF2 is overkill. :-) I'm sure I'll encounter a project where Symfony makes sense eventually, though.

Best of luck!

@mdjaman
Copy link

mdjaman commented Jul 24, 2014

@danhunsaker How to do :
Option 2 - Set up a dedicated queue for sequential jobs, and assign only one worker to it. Assign all jobs that need to be done sequentially to that queue instead of the default(s)

@epicwhale
Copy link
Contributor Author

Why didn't anyone suggest the enqueueOnce(..) function in this bundle? I also noticed that it isn't documented for some reason...

cc: @danhunsaker

@danhunsaker
Copy link

Possibly because that's not actually what was asked for. It wasn't preventing more than one of a job at a time from being queued. It was preventing more than one of a job at a time from being run. Very different approach, then.

Also, the fact it's undocumented doesn't help.

@epicwhale
Copy link
Contributor Author

Point 3 in the question was Ensure that there are no duplicate jobs in the queue. (How can I achieve this?). I guess this solves that?

Yes, this seems to be a hidden gem. the enqueueOnce(..) function.. has it been tested / used in production?

@danhunsaker
Copy link

Better to write idempotent jobs, but yeah, that would probably also work.

I honestly don't recall.

@danhunsaker
Copy link

I take that back. I don't know how much testing enqueueOnce() has gotten, but it's brand new, within the last couple of weeks, so that's why it's neither documented nor mentioned above. It didn't exist yet. Somehow completely forgot working with the contributor on that one.

Hopefully we'll see some documentation on that soon.

@darkromz
Copy link

@epicwhale i came across this thread after looking for the same thing, preventing duplicate jobs being added to the queue, and then saw your comment about "enqueueOnce" and also also mentioned by your self i can't seem to find anything about it, can you give any examples of code on how to use this.

@epicwhale
Copy link
Contributor Author

@darkromz been long since I've worked with something around this library.. have a look at the function maybe?

public function enqueueOnce(Job $job, $trackStatus = false)

@darkromz
Copy link

thanks for the quick reply, i will give it a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants