Hacker Newsnew | past | comments | ask | show | jobs | submit | cshimmin's commentslogin

That is simply not true. The naive “glorified auto-complete / stochastic parrot” argument may have some merit when applied to generic pre-trained models, which only learn from unsupervised next-token prediction. But the post training through reinforcement learning the frontier models undergo is very sophisticated and they genuinely learn to do novel things that are purely the work of the model being trained (and the work of the GPUs they burn along the way of course).


It's on the page, if you click the little info icon in the upper-right. Here's the text but there's some nice graphics there too:

  Snake Game, training entirely in the browser. Built on tinygrad: the rollout / targets / train graphs are TinyJits authored in Python, then compiled once to WGSL and replayed here under WebGPU.

  Observation: flat 10×10 board (100) + 4-dim prev-action one-hot = 104 dims. fc_pi.weight is zero-init so the opening policy is uniform over the legal actions; fc_v uses tinygrad's default Kaiming init.

  Per rollout: T=24 × N=384 parallel snakes (9,216 transitions), then K=3 epochs × 4 mini-batches of PPO updates. GAE γ=0.99, λ=0.95; AdamW wd=0.01; ratio clip ε=0.1; grad-norm 0.5; Huber value β=1, val_coef=1; entropy bonus 0.008333333333333333.

  Action mask + value clip + KL early stop. The 4-dim prev_a obs tail lets fc_pi zero the U-turn logit (the env silently overrides same-axis reversals anyway). Value loss is max(huber(v_new−td), huber(v_clip−td)) at ε=0.2. Approx-KL is sampled after each epoch and breaks the loop at 1.5·kl_target.


Very helpful! Naïve question (I haven’t had a chance to read TFA at all and diffusion/flow models are not my area of expertise). Doesn’t learning the integral/solution of the diffusion process in a single pass just take us back to like OG generative CNN that we had before diffusion models took over? Surely the answer is “no” but would love to hear your framing as to why.


It kind of does! In the modern era of generative modelling, it seems like we rely on pre-training to capture the data distribution, and then on post-training (and various other tricks) to carve out a sliver of that distribution that we actually care about (i.e. what we want our model to generate).

To be able to specify that subset with relatively few examples, a good high-level understanding of the data distribution is necessary. The way I see this, is that training a diffusion model gets you to that point, and then once you've selected the part of the distribution you actually care about, you can distill it down quite aggressively, because you no longer need all of that computation to model a much simpler distribution (sometimes all the way to one step, but usually it's a few steps in practice).


When I was a research physicist I spent a lot of time looking at the effects of ionizing radiation in pictures, although mostly in the context of digital images. The mechanisms are a bit different for photo emulsions, but to me the reason I'd discount radiation is because they're specifically filtering for features that exhibit the expected point spread function (which is a geometric property of the telescope's optical assembly itself). I guess you could test by exposing emulsion plates to ionizing radiation and seeing how often you get PSF-like images by chance. Also, their search is for +/- 1 day of nuclear testing, which seems weird. Certainly radiation from fallout wouldn't make sense on the day before testing. It would have been useful to see +1 day and -1 day separately. Or 0-2 days. The way it's chosen makes me suspect they couldn't find a signal in those windows, and therefore it's probably just statistical noise that they've massaged out of the data.

But to me the biggest flag is that these images are from 50 minute exposures. The objects don't appear as streaks, so they are either very, very short flashes (much shorter than 50 min), or they are very far away. The authors interpret this to mean the objects should be in geosynchronous orbit, which doesn't make sense; objects in geosync would still appear to move relative to the star background over the course of 50 min. Yet this is the entire basis for their "shadow deficit" window calculation. You could constrain the duration vs distance by looking at the effect it would have on smearing the PSF, which would be interesting.

Overall it seems pretty unscientific. If you go looking through enough statistically noisy data for signals in enough places, you'll eventually find it.


Yes, 50-minute exposures would certainly rule out geosynchronous; I've used image stacking to look at geo and you get visible movement relative to the star background after even a few seconds. Fifty minutes would be almost 15 degrees of movement relative to the background! This isn't even accounting for the fact that you would need to be looking in a narrow region above above the equator to get something geosynchronous to begin with.

There are other possiblities that are likely: Upper atmosphere tests resulting in transient luminous phenomena. This would be more likey in certain conditions where the sun could reflect off of specular matter (e.g., bits of metal). You would see this most likely within 1-2 hours of sunset or 1-2 hours of sunrise (source: I've used optical equipment to spot satellites professionally).

I'd note that thier pipeline for removing "plate defects" is not based on the PSF but on some vaguely defined "expert review" training. This can, and should, be a quantifiable step.


> The objects don't appear as streaks, so they are either very, very short flashes (much shorter than 50 min), or they are very far away.

Couldn’t be aberrations in equipment, like lenses? Or film development?


As stated in the abstract, the anomalies occur more within a window around a nuclear event.


This precise point has been challenged, FWIW. See https://arxiv.org/pdf/2601.21946.


+/- 1 day of nuclear testing because these are old records so dates and times reported might be inaccurate.


  > Overall it seems pretty unscientific.
I'd agree with all your points and add some things to help people better "sniff-test" these kinds of papers.

  1) The paper is suggesting aliens... your suspicion hats should always go on
    - Carl Sagan said: "Extraordinary claims require extraordinary evidence". Is the evidence extra-ordinary?
  2) The authors aren't experts
    - Stephen Bruehl: A doctor of Anesthesiology
    - Brian Doherty: "Independent Researcher"[0]
    - Alina Streblyanska: Actually maybe a astrophysics researcher?[1]
    - Beatriz Villarroel: The top Google hit for her is for a UFO wikipedia[2]
  3) Authors don't share affiliations
    - Corresponding author has no domain expertize and no clear affiliation to others.
  4) Authors have hints of metric hacking
    - Villarroel has 8 citations in a paper with only 18[3]
  5) The GitHub repo is dead: https://github.com/dca-doherty/VASCO-ML
None of these things are enough to conclude that the paper is wrong, but they are red flags and don't require actually understanding any of the details of the paper.

If you do understand statistics there's clearly more red flags. The +/- windowing being a pretty big one, since there are much better tools for this (errors don't need to be symmetric! Nor do they need to be uniform!). There's also a pretty big assumption made that cshimmin didn't mention: the paper assumes all nuclear tests are in the public record. But I also assume if you have a strong statistics background then there's a high probability you didn't upvote the post.

[0] The man has effectively no online presence. Google searching his email yields effectively nothing except people posting about this paper in UFO groups (https://www.google.com/search?q=%22briandohertyresearch%40gm...). His linked GitHub also makes him anonymous (https://github.com/dca-doherty/) and his website linked is just about finding day care in Texas. He has one more paper on ArXiv, but it is from a few weeks prior

[1] Found their Linkedin (https://www.linkedin.com/in/alina-streblyanska-95b2375b/). Their most recent paper is also on UAPs, along with Villarroel. But also, they work for "Society of UAP Studies", which should be a big red flag. Also, they were working as a Post-doc for 12 years, which is a bit insane

[2] https://www.wikidisc.org/wiki/Beatriz_Villarroel and here's here Google Scholar https://scholar.google.com/citations?user=_Jc8gm0AAAAJ

[3] I looked at some other papers of hers and they show a similar pattern. This explains her citation count (which is rather low) and h-index (it's better to just click on the references and you'll see it's predominantly her referencing herself):

  - 2602.15171: 9 citations total, 8 are hers
  - "A cost-effective search for extraterrestrial probes in the Solar system" has many more, but still 6 to herself (and 3 to Loeb)
  - Transients in the Palomar Observatory Sky Survey (Yes, this is in "Nature"): 20 citations, 5 hers
  - Aligned, Multiple-transient Events in the First Palomar Sky Survey: 11/36
  - On the Image Profiles of Transients in the Palomar Sky Survey: 5/5
  - A Civilian Astronomer's Guide to UAP Research: 7/98 (actually not a red flag, but the title sure is...)
  - and so on


Not gonna lie, the first thing I noticed was that the first author was in an anesthesiology department. Your guidelines for sniff-testing are not unreasonable, and can definitely be helpful to people who are unfamiliar with the research area. But I quite intentionally did not appeal to any of those. As a (somewhat) subject matter expert, it's important to _ignore_ things like ad hominem judgement, and instead address the paper on its self-contained merits. And more importantly, to share my assessment of those with the lay public.


I'm glad you did it that way. I hope, my comment works well as an addendum to your type of comment. I don't think would have worked well on its own, nor prior to yours. Especially since nothing I said is an absolute rule that allows one to reject a work. But this paper sure does smell suspicious. I think it's good to have the stronger reasons to be suspicious and then understand some softer flags to navigate in unfamiliar territory.


It kinda sounds like a post-doc, in that it provides an on-ramp to working in the industry/institution. But without having to waste your time getting a PhD.


> But without having to waste your time getting a PhD

Ah yes, that 'waste of times' having to learn things in aeronautics and physics..


The 6 is part of 3.6, the model version. 35B parameters, A3B means it's a mixture of experts model with only 3B parameters active in any forward pass.


Got it. Thanks


Incidentally, I recently learned the origin of the term. Cyber - short for cybernetic - is from the greek κυβερνήτης (kybernetes), meaning helmsman. The original use of cybernetics is in the context of automated control systems, so steering a rudder was a good analogy. It is also the origin for the name k8s.


In the early days of socialization on the Internet it had a very different meaning!!


In my headcanon, I still read k8s as "network of cubes", as in Borg cubes, as Kubernetes itself is a poor man's Borg (as in the thing that Google runs on, named after Star Trek Borg, known for cube-shaped ships referred to as "Borg cubes"). The whole kyber thing sounds like an explanation after the fact, to detach from the Collective legacy.


Have they ever released the full internal Borg toolset and software ?


a/s/l?


Yeah perhaps a better term for Loser is Abstainer. Because the Sociopaths also can certainly lose at the game of maximum capitalist profit. Loser/Abstainer just chooses not to play the game.


The problem with these theories is that they fall apart as soon as you start adding or modifying the types. Because they aren't actually correct, just simple and flattering.


I think it'd be more accurate to say that in their neat essay form they are incorrect/incomplete, but that there's a kernel of truth to them.

Essays like this want to package an idea into a nice, easy to understand thing that has some punch to it. Reality is more complex.


Fully agreed. I think "Loser" is a misnomer. And indeed, going by the essay, the Sociopaths can also lose big... they are willing to risk it all for personal gain, but it can end very badly for them if they miss their window, their manipulations get exposed, or decide to do illegal things to get ahead (high profile cases in my mind: Enron, Epstein, etc).


The names come from a cartoon that predates Rao's essay. He simply reused them because they mostly work. Just like the Sociopaths are not all literal sociopaths, the Losers are not all literal losers.


Yes, I understand this. I was simply making this explicit, it was a good idea to clarify that neither Losers nor Sociopaths match the common definition of those terms.


It's basically just a way for the LLM to lazy-load curated information, tools, and scripts into context. The benefit of making it a "standard" is that future generations of LLMs will be trained on this pattern specifically, and will get quite good at it.


> It's basically just a way for the LLM to lazy-load curated information, tools, and scripts into context.

So basically a reusable prompt like the previous has asked?


Ah not exactly.

The way the OP phrased it

> Is a skill essentially a reusable prompt that is inserted at the start of any query?

Actually is a more apt description for a different Claude Code feature called Slash Commands

Where I can create a preset "prompt" and call it with /name-of-my-prompt $ARGS

and this feature is the one that essentially prefixes a Prompt.

The other description of lazy loading is more accurate for Skills.

Where I can tell my Claude Code system: Hey if you need to run our dev server see my-dev-server-skill

and the agent will determine when to pull in that skill if it needs it.


Yes, but with more sales magic sprinkled on top.


Does it persist the loaded information for the remainder of the conversation or does it intelligently cull the context when it's not needed?


This question doesn’t have anything to do with skills per se, this is just about how different agents handle context. I think right now the main way they cull context is by culling noisy tool call output. Skills are basically saved prompts and shouldn’t be that long, so they would probably not be near the top of the list of things to cull.


Claude Code subagents keep their context windows separate from the main agent, sending back only the most relevant context based on the main agent's request.


Each agent will do that differently, but Gemini CLI, for example, lets you save any session with a name so you can continue it later.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: