Re: Shape of a knotty(?) threading problem
Re: Shape of a knotty(?) threading problem
- Subject: Re: Shape of a knotty(?) threading problem
- From: Dave Fernandes <email@hidden>
- Date: Mon, 25 Mar 2013 22:31:26 -0400
Correct me if my understanding is incorrect:
Your "input logic" processes some input on thread 1 until it "settles". It then signals the "6502" object that input is available and waits until the 6502 picks up (copies) this data. The 6502 picks up the input data, and starts processing it on thread 2; and also tells the input logic that it can start processing the next round of input.
If this is the case, then you can handle the synchronization with a shared dispatch semaphore. The 6502 thread waits on this semaphore, and the input logic thread signals the semaphore when it has settled. Once it signals the semaphore, it exits, and a new input logic "operation" is queued (dispatch_async) by the 6502 once it has copied the state from the input logic. Once the 6502 has finished processing its input, it again waits on the semaphore.
On 2013-03-25, at 8:45 PM, Graham Cox <email@hidden> wrote:
> Hi,
>
> I'm trying to solve a fairly complicated problem, and I think threading will be a solution, but it's not something I'm a great expert on, so I'm hoping that someone will be kind enough to help me form the general shape of the solution. At first this might not seem to be a Cocoa question but it is, because everything I will describe is implemented in Cocoa and the solution must be too.
>
> I have a largeish body of code that implements the data model for a digital logic simulator. This simulator simulates down to the gate level (e.g. the simplest functional block is a single NOT gate) and is a cycle-based simulator. The exact details of its operation aren't that important, but broadly it consists of two kinds of objects, devices and inputs. A device is a functional block, such as a gate, but could be anything that implements some logical function in terms of inputs, outputs, time and its own state. If that sounds vague it probably is, but the idea is that more complex functions can be simulated in code for performance - in other words it is not *required* to simulate at the gate level, but it can. Inputs represent a single logic signal in the system, i.e. 1 bit or wire. Outputs are a subclass of inputs and form the network of connections between devices, passing data from output to input(s) representing logic level signals.
>
> The simulation runs in a cycle, which is a great big loop that iterates over the devices and tells each one to evaluate its inputs + current time + current state as needed and set its outputs. The outputs drive further inputs and so on. Note that changing an input does not immediately trigger evaluation of a device's function. This allows arbitrary feedback loops in a network without creating infinite recursion in code (where setting an output sets an input which in turn sets the same output, etc). The cycling ensures everything proceeds in an orderly fashion and feedback connections work correctly without any special kludges (as an aside a very popular logic simulator doesn't do this but has a nasty kludge that detects if a function recurses more than 99 times before telling the user they're a bozo. Needless to say it severely limits the kinds of circuits that can be simulated). It also means that the cycle time of the whole simulation is a sort of "atomic time" which sets a lower limit on how fast a device can respond, and as a circuit grows in complexity the time for the simulation to settle to its final correct state can vary quite a lot. This is fine, because it can be taken to simulate, roughly, real-world propagation delays and settling times. However, it is rather indeterminate, since the simulation has no concept of the "order" in which devices need to be evaluated - indeed, devices are held by an NSSet so the evaluation order is arbitrary. Worst case, the state of a system might only settle after n cycles where n is the number of devices, best case it would settle in 1 cycle. Typically it's somewhere in between. This doesn't have any bearing on the functional correctness of the simulation, but it does for the problem I'm working around to. (If this sounds wrong to you, consider that most circuits don't have any "order" once you have feedback and loops - where do you jump in?). Note that an optimisation in the system means that devices skip evaluation if their inputs haven't changed since the previous
iteration, but if they are time-sensitive this optimisation can be opted out of. I mention this for completeness, I don't think it matters for the problem. Another thing to be aware of is that the simulation cycle itself can either be run on its own thread, or hitch a ride on the main thread. The simulator works beautifully and so far hasn't failed to correctly simulate any logic circuit I've put together with it.
>
> Now, I want to add a device to the system that can emulate a microprocessor. (I happen to have chosen the venerable 6502 to start, but this is a generic problem). I have some open source code in C that emulates the processor, but this code does not emulate the memory reads and writes very realistically. Essentially, even though it provides hooks for the reads and writes to be vectored through (my) external code, it still assumes that these reads and writes are fully synchronous, e.g. it requests memory from an address and expects the data to be returned synchronously before it continues with its emulated execution. It also assumes that it is driving the emulation, so it's designed to sit there in a tight loop fetching, decoding and executing the emulated code.
>
> The emulated processor needs to be reworked so that it works within the overall logic simulation, which is taking the emulation a stage further, so that it emulates hardware more realistically. (For my purposes this is more important than speed). This is where it gets complicated, because there isn't a 1:1 correspondence between the simulation cycle and, say, one execution cycle of the emulated cpu. The cpu might for example require several memory fetches to execute one instruction, and the memory fetch must be implemented by putting the address out onto the address lines and waiting for the simulated logic to eventually return some data on the data lines. This can take a completely indeterminate number of simulation cycles, since any amount of logic could exist between the address lines and the simulated memory device (address decoder logic, etc). The simulated processor device needs to have some sort of wait state to allow the general simulation time to perform the memory fetch.
>
> OK, long drawn-out scene setting, here comes the problem/possible solution.
>
> So here's what I'm thinking. The simulated cpu can run on its own thread, rather than be locked to the simulation thread. This allows the CPU to run at its own pace and operate in the way it was designed to - in charge of its own destiny. When it needs to perform a memory read or write, the external read/write handlers implement some sort of lock which block the CPU thread and issue the memory address on the simulation thread. The device is then put into a wait state. After some time (this is a bit awkward, I'm not sure how to handle this, there isn't usually a 'data ready' line available in the logic, it might just be a case of counting some arbitrary number of cycles and assuming/stipulating that the data must be ready after a certain number have elapsed - this is analogous to the data setup/settling time allowed in a real hardware system) the data is ready, which satisfies the lock which then unblocks the cpu thread which continues with the returned data exactly at the point it left off. The cpu code is none the wiser, and doesn't need to be refactored so that it implements asynchronous memory access.
>
> Note that the blocked cpu thread could wait forever. If the simulated cpu device is not connected to any peripheral memory (as it would be when first added to a system) a memory fetch will never complete. That must be allowed - if the device is subsequently deleted, the blocked thread must be killed even though it never ran to completion.
>
> So my question is, does this sound like a feasible way to approach this? If so, what sort of thing is this lock - would NSConditionLock be useful here? Does it do what I'm supposing it does - block one thread until another thread satisfies the condition? If so, a brief example of its use would be gratefully received. Or some other completely different solution....
>
> Wholesale redesign of the simulator or the cpu emulator isn't really a possible solution, though I'm prepared to make some changes if it would make it easier. The cpu emulation is not code I wrote, though I have the source.
>
> So, hopefully this will pique someone's interest enough to shove me in the right direction.... many thanks in advance for any insights you might have.
>
>
>
> --Graham
_______________________________________________
Cocoa-dev mailing list (email@hidden)
Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden