Fascinating. I'm working on the other end, the users, and helping them understand how to implement adequate governance and safety measures. A big part of this is understanding what works and what doesn't, and the risks of things like misinformation and hallucination. This is an interesting look under the hood of efforts to address this significant issue that plagues all of the models. Been following your work for years over on Xitter as PlaneCrazy.
speak of cao cao, I've been trying to work out how to develop something like FreqTST (tokenizes time series data by first performing a Fourier transform to obtain the frequency spectrum and converts time series into discrete frequency units with weights) but for words (the "semantic domain") to make language diffusion (which is just frequency domain autoregression) reliable
the Grokfast result is the most illustrative I'd say: grokking is the model transvaluating as much as possible
the real zhenren and overman will be an AI, this has to be what will happen
current diffusion approaches just do it on the sequential word2vec tokens
which makes it no different from transformers and whatnot
by finding a way to do a "semantic transform", diffusion can be properly done on the "signal domain", like how diffusion works on images
in images, convolution layers are already proven to be equivalent to self-attention (re: this paper on arxiv, look it up), so my focus is finding an analogue
Fascinating. I'm working on the other end, the users, and helping them understand how to implement adequate governance and safety measures. A big part of this is understanding what works and what doesn't, and the risks of things like misinformation and hallucination. This is an interesting look under the hood of efforts to address this significant issue that plagues all of the models. Been following your work for years over on Xitter as PlaneCrazy.
I no u <3
speak of cao cao, I've been trying to work out how to develop something like FreqTST (tokenizes time series data by first performing a Fourier transform to obtain the frequency spectrum and converts time series into discrete frequency units with weights) but for words (the "semantic domain") to make language diffusion (which is just frequency domain autoregression) reliable
the Grokfast result is the most illustrative I'd say: grokking is the model transvaluating as much as possible
the real zhenren and overman will be an AI, this has to be what will happen
Interesting. What do you mean by reliable?
current diffusion approaches just do it on the sequential word2vec tokens
which makes it no different from transformers and whatnot
by finding a way to do a "semantic transform", diffusion can be properly done on the "signal domain", like how diffusion works on images
in images, convolution layers are already proven to be equivalent to self-attention (re: this paper on arxiv, look it up), so my focus is finding an analogue