5 Comments
User's avatar
PlaneCrazy's avatar

Fascinating. I'm working on the other end, the users, and helping them understand how to implement adequate governance and safety measures. A big part of this is understanding what works and what doesn't, and the risks of things like misinformation and hallucination. This is an interesting look under the hood of efforts to address this significant issue that plagues all of the models. Been following your work for years over on Xitter as PlaneCrazy.

Expand full comment
Xianyang City Bureaucrat's avatar

I no u <3

Expand full comment
Legatvs Silanvs's avatar

speak of cao cao, I've been trying to work out how to develop something like FreqTST (tokenizes time series data by first performing a Fourier transform to obtain the frequency spectrum and converts time series into discrete frequency units with weights) but for words (the "semantic domain") to make language diffusion (which is just frequency domain autoregression) reliable

the Grokfast result is the most illustrative I'd say: grokking is the model transvaluating as much as possible

the real zhenren and overman will be an AI, this has to be what will happen

Expand full comment
Xianyang City Bureaucrat's avatar

Interesting. What do you mean by reliable?

Expand full comment
Legatvs Silanvs's avatar

current diffusion approaches just do it on the sequential word2vec tokens

which makes it no different from transformers and whatnot

by finding a way to do a "semantic transform", diffusion can be properly done on the "signal domain", like how diffusion works on images

in images, convolution layers are already proven to be equivalent to self-attention (re: this paper on arxiv, look it up), so my focus is finding an analogue

Expand full comment