You can use the protocol buffer schema language to define your ASTs if you want, but I think that addresses only a relatively small part of the problem.
There are two larger problems in adding Lisp-style macros to non-Lisp languages, one social and one technical.
The social problem is that language designers must be persuaded to publish a specification of the internal representation of the AST of their language. This makes the AST a public interface, one which they are committed to and can't easily change. People don't like to do this without a good reason.
The technical problem is more difficult, though. To make a non-Lisp language as extensible as Lisp would require making the parser itself extensible. This is not too hard to implement, but perhaps not so easy to use. If you've ever tried to add productions to a grammar written by someone else, you know it can be nontrivial. You have to understand the grammar before you can modify it.
And if you overcome the difficulties of having one user in isolation add productions to the grammar, what happens when you try to load multiple subsystems written by different people using different syntax extensions which, together, make the grammar ambiguous?
I don't know that these problems are insurmountable, but a few people have taken a crack at them, and AFAIK no one has produced a system that any significant number of people want to use.
It's worth taking a look at how Lisp gets around these problems. Lisp has not so much a syntax as a simple, general metasyntax. Along with the well-known syntax rules for s-expressions, it adds the rule that a form is a list, and the meaning of the form is determined by the car of the list -- and if it's a macro, even the syntax of the form is determined thereby.
Add a package system like CL's, and you get pretty good composability of subsystems containing macros. You can get conflicts, but only when you explicitly create a new package and attempt to import macros from two or more existing packages into it.
Applying these ideas to a conventional language gives us, I think, the following:
() While the grammar is extensible, all user-added productions must be "left-marked": they must begin with an "extension keyword" that appears nowhere else in the grammar.
() Furthermore, those extension keywords are scoped: they are active only within certain namespaces; elsewhere they are just ordinary names. This requires parsing itself to be namespace-relative, which is a bit weird, but probably workable.
I think that by working along these lines it might be possible to add extensible syntax to a conventional language in a way that avoids both the grammatical difficulty and the composition problem. And if you do that, maybe you can then get the relevant committees or whoever to standardize the AST representation for the language.
I've never taken a crack at all this myself, though, because I'm happy writing Lisp :-)
My goal is not to add Lisp-like macros to every language. That would be a bit bit presumptuous; not all languages want Lisp-like macros.
My goal is to make AST's as available and easy to traverse/transform as they are in Lisp. This is the foundation that makes things like Lisp's macros as powerful as they are. And easy access to AST's enables so many other things like static analysis, real syntax highlighting, and detecting syntax errors as you type.
In a way, Lisp-like macros are just a special-case of tree transformation that puts the tree transformer inline with the source tree itself. But this is not the only possible approach. You could easily imagine an externally-implemented tree transformer that implemented GCC's -finstrument-functions. This tree transformer could be written in any language; there's no inherent need to write it in C just because it's transforming C.
It's true that a complier/interpreter could be reluctant to expose their internal AST format. But there's no reason that the AST being traversed/transformed has to use the same AST schema that is used internally; if you can translate the transformed AST back to text it could then be re-parsed into a completely different format. And with a correctly implemented AST->text component, this would not be a perilous and fragile process like pure-text substitution is.
The author of Magpie has some interesting ideas about designing a language with extensible syntax. Sorry I can't find you a more specific link right away.
There are two larger problems in adding Lisp-style macros to non-Lisp languages, one social and one technical.
The social problem is that language designers must be persuaded to publish a specification of the internal representation of the AST of their language. This makes the AST a public interface, one which they are committed to and can't easily change. People don't like to do this without a good reason.
The technical problem is more difficult, though. To make a non-Lisp language as extensible as Lisp would require making the parser itself extensible. This is not too hard to implement, but perhaps not so easy to use. If you've ever tried to add productions to a grammar written by someone else, you know it can be nontrivial. You have to understand the grammar before you can modify it.
And if you overcome the difficulties of having one user in isolation add productions to the grammar, what happens when you try to load multiple subsystems written by different people using different syntax extensions which, together, make the grammar ambiguous?
I don't know that these problems are insurmountable, but a few people have taken a crack at them, and AFAIK no one has produced a system that any significant number of people want to use.
It's worth taking a look at how Lisp gets around these problems. Lisp has not so much a syntax as a simple, general metasyntax. Along with the well-known syntax rules for s-expressions, it adds the rule that a form is a list, and the meaning of the form is determined by the car of the list -- and if it's a macro, even the syntax of the form is determined thereby.
Add a package system like CL's, and you get pretty good composability of subsystems containing macros. You can get conflicts, but only when you explicitly create a new package and attempt to import macros from two or more existing packages into it.
Applying these ideas to a conventional language gives us, I think, the following:
() While the grammar is extensible, all user-added productions must be "left-marked": they must begin with an "extension keyword" that appears nowhere else in the grammar.
() Furthermore, those extension keywords are scoped: they are active only within certain namespaces; elsewhere they are just ordinary names. This requires parsing itself to be namespace-relative, which is a bit weird, but probably workable.
I think that by working along these lines it might be possible to add extensible syntax to a conventional language in a way that avoids both the grammatical difficulty and the composition problem. And if you do that, maybe you can then get the relevant committees or whoever to standardize the AST representation for the language.
I've never taken a crack at all this myself, though, because I'm happy writing Lisp :-)