Monday, May 21, 2018

More Practical advice - Producer-Consumer threads

Two in one day!  What's the world coming to?

Ok, a common paradigm is a producer-consumer threaded application, using a queue to pass the message to be processed.  I recently coded one, and we were discussing how to manage the number of threads in the future, when we would want the application to dynamically allocate threads to match load.

Now, adding threads to increase capacity is simple - you just need to create the thread with the appropriate parameters (input queue, output sink, etc), save off the info so you can reap the thread at termination, and Bob's your uncle.

But removing threads is more complicated, because you don't want to kill a thread in the middle of processing a message.  So you need a means to tell a thread to die when it's done with the current message.

What I had come up with for shutting down all the threads at program end was pretty simple - the Producer emits a message with a specific format that when read by a thread, tells the thread to exit its  main loop and return.  What was moderately cool was that I had the thread re-emit the message into the queue, so that the next thread to read it would also stop, and would re-emit the message again, until all threads stopped.

The today I realized that if I added a simple counter onto the message format, I could use that as a way to reap a specific number of threads without any new mechanism.  The Consumer thread sees that flag message, exits its main loop, and then checks the counter value for 0.  If it's not 0, it decrements it by 1, and re-emits the message with the new counter value.  So start the message with a counter of 3, and 3 threads see it, stop, and re-emit it once each.  The last one sees the counter is 0, and does not re-emit it.

This can be used to kill all the threads but setting the counter to -1, which will never be 0, so all the threads will see, process, and re-emit the flag message (unless you have so many thread that the value wraps, but that's an entirely different problem)



Practical advice re CSV files

Morning, everyone!

I'm sure everyone has had a situation where they were generating data for another system, and needed a text format, and decided that comma-separated values (CSV) would work nicely.  Well, here's a little advice:  if you have any reason to expect a) human-driven input for fields, or b) reformatting of fields by some programatic means, then chose to use a pipe '|' as the separator instead.

The reason is that there is little use for that character in normal human text usage, so it will not get used within a field, the way commas often are in things like addresses, or names.  And as I found out recently, some encryption software is perfectly happy producing cyphertext with commas embedded