Used TLA+ for security/availability modeling at a previous venture. Didn't find it added more value than other formal methods with less scope and ambition when it came to implementation, but I had very limited resources and we were greenfield so the experience probably wasn't generally applicable.
IMHO (from an implementation perspective): (1) If state is a pain to model, this may be a symptom you aren't placing interfaces in the right places (ie. scope is too broad). (2) If your system behavior departs from known models (ie. expected behavior under rigorously analyzed and easy to model consensus algorithms) then this may be a symptom of larger issues such as unrecognized state or edge cases, which are better tackled first. (3) A practical alternative to the theoretical model of 'safe' machines within the post would be an implementation model with a functional approach ... ie. if in doubt, reset to known state. (4) To model "bad" actors practically, you have to define them. To do so within meaningfully any complex (black box) multi-agent (networked) system, you need a good model of known "good" behavior for any given node. Building this is IMHO more useful than pie-in-sky theoretical modeling, because it allows the efficient (~fully automatic) deployment of practical countermeasures (anomaly detection on resource utilization, communication, etc.). To do this for near-arbitrary systems requires a tight CI/CD pipeline with above-average full stack security insight, for-purpose infrastructure and management level commitment to the additional resources it will cost. Only when all that is done, model for concurrently compromised nodes at this level, because you've surpassed more immediate/pragmatic requirements.
Well, for some definition of formal, both state machines and rigorously defined {interfaces|message sequences|protocols} (ie. those with specification-based, generative implementations of endpoints, validating middleware proxies, etc.).
There are two models of reality that I find to be the most useful ones, especially when writing programs. The first is functions, and the second is sequences of states. - Leslie Lamport
I find (variants of) QuickCheck https://hypothesis.works/articles/quickcheck-in-every-langua... are pretty good at finding at least a lot of the invariant problems that TLA+ finds, and if you're lucky your language has it integrated already. The F# implementation FsCheck is pretty amazing.
I got the terms from _Software Requirements and Specifications_, which is a fun, if fluffy and meandering, read. I only skimmed the paper you linked but it seems to be a more thorough treatment than the book was.
I don't know any game theory so can't say much about the landscape, but I know people've modified PRISM to model games: http://www.prismmodelchecker.org/games/
Fair warning though: PRISM is... weird. You can analyze some crazy complex properties in it but you don't get basic affordances like "strings", "lists", or "functions that take parameters".
this is the kind of dark magic that is fascinating to read about and I hope there are people actually using it in the described way to design and implement safe systems and protocols. I haven't heard of any (not that I looked much).
IMHO (from an implementation perspective): (1) If state is a pain to model, this may be a symptom you aren't placing interfaces in the right places (ie. scope is too broad). (2) If your system behavior departs from known models (ie. expected behavior under rigorously analyzed and easy to model consensus algorithms) then this may be a symptom of larger issues such as unrecognized state or edge cases, which are better tackled first. (3) A practical alternative to the theoretical model of 'safe' machines within the post would be an implementation model with a functional approach ... ie. if in doubt, reset to known state. (4) To model "bad" actors practically, you have to define them. To do so within meaningfully any complex (black box) multi-agent (networked) system, you need a good model of known "good" behavior for any given node. Building this is IMHO more useful than pie-in-sky theoretical modeling, because it allows the efficient (~fully automatic) deployment of practical countermeasures (anomaly detection on resource utilization, communication, etc.). To do this for near-arbitrary systems requires a tight CI/CD pipeline with above-average full stack security insight, for-purpose infrastructure and management level commitment to the additional resources it will cost. Only when all that is done, model for concurrently compromised nodes at this level, because you've surpassed more immediate/pragmatic requirements.