GeistHaus
log in · sign up

https://feeds.feedburner.com/blogspot/wCeDd

atom
25 posts
Polling state
Status active
Last polled May 19, 2026 01:35 UTC
Next poll May 20, 2026 00:23 UTC
Poll interval 86400s
Last-Modified Mon, 18 May 2026 07:06:16 GMT

Posts

You just witnessed an AlexNet moment in RAG because MaxSim is a Submodular Norm
CSLLMML
Show full content


This past week, The BrowseComp-plus benchmark was beaten by our R&D team and I believe this is an AlexNet moment for RAG. The first AlexNet moment occured in 2012 when Deep Neural Networks were shown to reduce tremendously the error on a benchmark that had been difficult to beat over the years. It started the whole Deep Neural Network revolution, utlimately leading us into this timeline of generative AI with LLMs. I believe we have reached the same moment for RAG with a technique developed by a whole community. It took some time as you will find out by reading the story below.

This blog post is also the story of how innovation work. It is a long journey research-wise inspired by many.

Article content

Getting this innovation into LightOn's product so that Search and Reason becomes the best to our customers, is one the most exciting part of the story. Let us note that all these innovations have been made open source. Buckle up, it's a fun mathy ride.

The Mathematical Object Everyone Overlooked

There's a class of functions in combinatorial optimization called submodular functions. Their defining property is diminishing marginal returns: adding an element to a small set gives you more than adding it to a large set. Formally, for any sets A ⊆ B and any element x:

f(A ∪ {x}) − f(A) ≥ f(B ∪ {x}) − f(B)

This isn't an analogy. It's a mathematical structure with forty years of theory behind it. Submodular maximization has known greedy approximation guarantees (1 − 1/e ≈ 0.63 for monotone submodular functions under a cardinality constraint). Facility location, max-coverage, sensor placement, document summarization, they are all instances of the same structure.

MaxSim is an instance of this structure.

In ColBERT's late-interaction scoring, a query has Q token embeddings and a document has D token embeddings. The relevance score is:

Score = Σᵢ maxⱼ sim(qᵢ, dⱼ)

Each query token finds the document token it matches best. The document score is the sum of these per-token best matches.

This is a facility location objective. The query tokens are "facilities." The document tokens are "clients." Each client is served by its nearest facility. The total score measures how well the query covers the document's semantic content. And this coverage function is submodular — adding a new query token to the scoring provides diminishing marginal improvement as more tokens already cover the document's semantic space.

The diminishing returns here aren't a bug. They're the reason MaxSim works.

Why Submodularity Is the Right Norm for Retrieval

Retrieval is fundamentally a coverage problem. A query expresses an information need. A relevant document covers that need across multiple facets — facts, context, reasoning chains, supporting evidence. The scoring function's job is to measure how well the document covers the query.

Submodular functions are the mathematical tool for coverage. Their diminishing-returns property encodes exactly what you want:

Early matches are high-value. The first query token that finds a strong document match contributes a lot to the score. This captures the dominant signal.

Redundant matches are naturally discounted. If two query tokens match the same region of a document, the second match adds less. MaxSim doesn't double-count.

Diverse evidence is rewarded. A document that matches different query tokens across different facets scores higher than one that matches the same facet repeatedly.

This is why MaxSim exhibits strong out-of-domain generalization. The submodular structure doesn't depend on the domain, it depends on the geometry of coverage. A legal document covers a legal query the same way a biomedical paper covers a biomedical query: by matching diverse facets of the information need.

The Single-Vector Mistake

Now contrast this with dense single-vector retrieval. A document is compressed into one embedding. Similarity is a dot product or cosine between the query vector and the document vector.

This is a linear scoring function. There's no submodular structure. No diminishing returns in the matching because there's only one match. No coverage because there's only one point.

The entire document is projected through a single bottleneck, and all facets of meaning must coexist in one vector. When the query is simple, this works.

When the query requires reasoning across multiple facets — the kind of query that matters in enterprise search, in Deep Research, in agentic retrieval, the single vector doesn't have the representational capacity to capture what's needed.

The industry response: make the model bigger. 1B. 4B. 8B parameters. Each increase improves the quality of the single embedding, but the improvement curve flattens. This is diminishing returns in the wrong place : in the scaling law of the model, where each additional parameter buys less accuracy because the architectural bottleneck (one vector) hasn't changed.

Submodularity tells you exactly why this fails.

Coverage problems require a coverage objective. You can't solve a submodular problem with a linear scoring function by making the linear function more expensive to compute.

LightOn's Stack: Engineering the Right Mathematical Object

Knowing that MaxSim is the right scoring function is the easy part. The hard part is making it trainable, servable, and deployable at enterprise scale. LightOn built that infrastructure layer by layer, each solving a specific engineering barrier.

The token representations: ModernBERT (December 2024)

Article content
ModernBERT

MaxSim's quality is determined by the quality of the individual token embeddings. Each token is a point in the semantic space; MaxSim computes coverage in that space. Better points, better coverage.

ModernBERT (co-developed with AnswerAI) modernized the encoder: 8,192-token context, Flash Attention 2, rotary positional embeddings, 2 trillion training tokens. The atomic unit of MaxSim improved across the board. ModernBERT has been dowloaded 37 million times so far.


The domain adaptation proof: BioClinical ModernBERT (June 2025)

Article content
Bioclinical ModernBERT

A strong encoder is only useful if it transfers to specialized domains without retraining from scratch. BioClinical ModernBERT — a collaboration between the Dana-Farber Cancer Institute, Harvard, MIT, McGill, Albany Medical College, Microsoft Research, and LightOn — tested this by continuing ModernBERT's pre-training on medical texts.

A lesser-known scheduling feature of ModernBERT enables seamless continued pre-training: stable-phase checkpoints and a decay phase eliminate cold restarts. The team leveraged this to produce a new SOTA on medical classification and Named Entity Recognition, outperforming every existing medical encoder. Clinical notes and medical reports are long — exactly the regime where ModernBERT's hybrid attention and 8,192-token context matter most.

For the submodularity argument, this is a multiplier. MaxSim computes coverage in the space of token embeddings. If those embeddings can be cheaply specialized to biomedical, legal, financial, or defense domains — without retraining the entire stack — then the coverage function adapts to the domain for a fraction of the cost of training a new large model. BioClinical ModernBERT proved the recipe is reproducible: anyone can adapt ModernBERT to their vertical.

The architectural proof: Ettin (July 2025)

Article content
Ettin

A natural objection: maybe the encoder-only architecture isn't actually better for retrieval. Maybe a sufficiently large decoder can match it. After all, projects like LLM2Vec proposed converting decoders into retrievers.

Ettin, a collaboration between Johns Hopkins University and LightOn, settled this with the first controlled experiment. Six model sizes from 17M to 1B parameters, trained on identical data (2T tokens of fully open data), identical recipes (the ModernBERT training pipeline), identical architecture shapes. The only difference: encoder (bidirectional attention, MLM objective) vs. decoder (causal attention, CLM objective).

The results were unambiguous. A 150M encoder (89.2 on MNLI) outperformed a 400M decoder (88.2). On retrieval tasks, the gap was even larger. Cross-objective training — continuing to train a decoder with the encoder's MLM objective — still trailed native encoders.

This matters for the submodularity argument. MaxSim computes coverage in the space of token embeddings. Bidirectional attention lets each token see the full document context, producing richer representations at every position. Causal attention restricts each token to its left context — the first token sees nothing, the second sees one token, and so on. For a facility location objective where every token is a potential facility, bidirectional representations are strictly more informed.

Ettin proved this isn't a theory — it's a measurable architectural advantage that holds across six model scales, on identical data, with identical training. Encoders are fundamentally better at producing the token representations MaxSim needs.

Two practical consequences followed.

First, the Ettin encoders beat ModernBERT across all sizes while using entirely open, reproducible training data — validating that the recipe, not proprietary data, is what matters.

Second, the 17M Ettin encoder became the backbone for LateOn-Code-edge, the ultra-fast code retrieval model that runs locally inside ColGrep. The smallest point on the Ettin scale turned out to be exactly the right size for a single-binary semantic search tool.

The training: PyLate (2024–2025)

Article content
Pylate

ColBERT training required bespoke pipelines. PyLate (accepted at CIKM2025) reduced it to ~80 lines of code and under 2 hours on a single GPU. The first peer-reviewed library for late-interaction model training. Submodular retrieval became as easy to ship as a bi-encoder.

The multi-vector search: FastPlaidNextPlaid (2025–2026)

Article content
NextPlaid Multivector Database

MaxSim requires storing and searching per-token embeddings. FastPlaid, a Rust rewrite of Stanford's PLAID engine, delivered 554% throughput improvements. NextPlaid packaged it as a local-first multi-vector database with REST API, Docker, and ONNX INT8 quantization.

The cost of computing a submodular scoring function at scale dropped to production-viable levels.

The lexical complement: BM25X (2025–2026)

Submodular doesn't mean universal. BM25 handles exact keyword matching, acronyms, identifiers, cases where the semantic space isn't where the action is.

LightOn 's Rust BM25 engine provides streaming mutations, mmap indices, and pre-filtered search up to 600× faster. BM25X and MaxSim cover different failure modes. The full stack uses both.

The document pipeline: LightOnOCR-2 (January 2026)

Article content
LightOnOCR-2-1B

MaxSim needs tokens. The most valuable enterprise documents are locked in scanned PDFs. LightOnOCR-2 — 1B parameters, SOTA on OlmOCR-Bench, 9× smaller and 3.3× faster than Chandra-9B — converts them to text. On-prem, behind the firewall.

No tokens, no coverage. OCR is the front door.

The proof on standard retrieval: GTE-ModernColBERT (May 2025)

Article content
GTE-ModernColBERT

First model to beat ColBERT-small on BEIR — 18 heterogeneous datasets covering biomedical search, open QA, argument analysis, forums, and scientific knowledge bases. Token-level coverage, powered by a modern encoder, outperformed dense models on cross-domain generalization.

But GTE-ModernColBERT was built the way everyone builds ColBERT models: take a strong dense (single-vector) pre-trained model, bolt on a knowledge distillation step in the multi-vector setting at the very end. The submodular objective was an afterthought: the last fine-tuning phase, not the training paradigm.

This left an obvious question hanging.

Training in the submodular objective from day zero: ColBERT-Zero (February 2026)

Article content
ColBERT-zero

If MaxSim is the right scoring function that is if submodular coverage is the right mathematical structure for retrieval, then why are we training models in the wrong objective for 95% of the pipeline and only switching to the right one at the end?

ColBERT-Zero, a collaboration between Ecole Polytechnique Fédérale de Lausanne (EPFL) and LightOn, answered this by performing contrastive pre-training directly in the multi-vector setting from the very first phase. Not as a final distillation step. From zero.

The result was striking. A dense baseline trained on GTE's proprietary data scored 55.33 nDCG@10 on BEIR. A dense baseline trained on Nomic's public data scored 52.89 (a 2.4-point data quality gap.) ColBERT-Zero, trained entirely on public data but in the multi-vector objective from scratch, reached 55.43 (closing and surpassing the proprietary-data gap.)

Read that again. Public data, worse by 2.4 points in the dense setting, beats proprietary data when you train in the submodular objective from the start.

This is the purest evidence for the submodularity thesis. The conventional pipeline: dense pre-training → dense supervised → multi-vector distillation — treats MaxSim as a post-hoc refinement. ColBERT-Zero shows it's a training paradigm. When the encoder learns token-level importance signals from the first gradient, through PyLate's GradCache (scaling to ~16K effective batch size without VRAM constraints) and cross-GPU gathering, it develops representations that are fundamentally different from what dense pre-training produces. The tokens learn to be good at being facility locations, not good at being compressed into a single point.

The practical finding was equally important: performing a supervised contrastive step before distillation closes most of the gap at a fraction of the cost. And prompt alignment between pre-training and fine-tuning is non-negotiable (stripping asymmetric prompts degrades performance significantly.)

All models, intermediate checkpoints, and training scripts were released under Apache 2.0. Including the SOTA on BEIR for models under 150M parameters.

The proof on reasoning: Reason-ModernColBERT (May 2025)

Article content
Reason-ModernColBERT

Fine-tuned for reasoning-intensive retrieval. 149M parameters. Outperformed every model up to 7B on BRIGHT, including ReasonIR-8B trained on identical data.

This is where the submodular argument bites hardest. Reasoning queries have multiple implicit facets: preconditions, intermediate steps, conclusions. A single vector can capture the dominant facet. MaxSim captures the coverage across facets. Same data, same task: the model with the submodular scoring function won, at 54× fewer parameters.

The proof on code: LateOn-Code + ColGrep (February 2026)

Article content
LateOn-Code and ColGrep

Code retrieval requires matching function signatures, variable names, docstrings, and structural patterns simultaneously. This is a multi-facet coverage problem. LateOn-Code (17M and 130M params) topped the MTEB Code leaderboard. ColGrep brought MaxSim to the terminal, beating grep 70% of the time while cutting agent token usage.

The deep reader after retrieval: OriOn (February 2026)

Article content
Orion

MaxSim solves the coverage problem: which documents address the facets of the query? But coverage is the first step. Once the retriever surfaces the right documents, an agentic system needs to read them — deeply, across hundreds of pages, without losing coherence.

This is a fundamentally different problem from retrieval. Retrieval is submodular coverage over a large corpus. Deep reading is long-context reasoning over a small, retrieved set. The two are complementary, and an enterprise pipeline needs both.

OriOn is LightOn's family of long-context visual language models. The 32B-parameter model processes up to 250 pages at full visual resolution in a single pass, matching or exceeding models 7× its size on the most challenging long-document benchmarks. On MMLBD-C — LightOn's manually corrected version of MMLongBenchDoc, the hardest benchmark for long-context visual document understanding — OriOn-Qwen-32B achieved 57.3, surpassing even its 235B teacher model (56.2). For context: expert human accuracy on this benchmark is roughly 65.8%, and GPT-4o scores 46.3%.

The connection to MaxSim is direct. In an agentic RAG pipeline, MaxSim's submodular scoring retrieves the right pages from millions of documents. OriOn then ingests those pages, not as extracted text chunks, but as rendered visual documents, preserving tables, charts, formatting, and layout, and reasons across them in a single forward pass. Thanks to prefix caching, each subsequent turn in an agentic loop is near-instant.

The training insights were released openly (50+ ablation experiments), and several challenged prevailing assumptions: training on genuinely long contexts that exceed your evaluation distribution can hurt performance; visual long-context training transfers strongly to text-only benchmarks (+11.5 points on HELMET from visual-only training); and a novel recursive answer generation pipeline enables self-improvement without a stronger teacher model.

OriOn completes the pipeline that MaxSim starts. Submodular coverage finds the evidence. Long-context deep reading reasons over it. Both deploy on sovereign infrastructure, on-prem, behind the firewall.

The AlexNet Moment: BrowseComp-Plus (March 2026)

Article content

BrowseComp-Plus is the ultimate coverage problem. 830 queries, each requiring 2+ hours for a human. Fixed 100K-document corpus. Paired with a reasoning LLM (GPT-5), the retriever's job is to find the documents that cover every facet of a complex information need, often across multiple rounds of search.

Article content
Open and closed models directly benefit from Reason-ModernColBERT

Reason-ModernColBERT + GPT-5: 87.59% accuracy. 7.59 points above the previous best. First place on accuracy, recall, calibration, and search efficiency (13.27 calls vs. 21+).

Article content
BrowesComp-plus leaderboard

The efficiency gain is a direct consequence of the submodular structure. MaxSim gives the LLM token-level evidence about which parts of a document match which parts of the query. The LLM reads this signal and decides which documents deserve a full read before committing tokens. One additional function, get_document(id) , is enough. No reranker. No oracle chunking.

Dense retrievers provide a single similarity score. The LLM has to guess what the document contains. Guessing takes more rounds. More rounds cost more tokens. Diminishing returns in the wrong place.

Making Sense of it all

Submodular functions are the mathematical formalization of diminishing marginal returns. MaxSim is a submodular norm — specifically, a facility location objective where query tokens cover document tokens. This structure is inherently suited to retrieval and RAG because retrieval is a coverage problem: does this document address the diverse facets of my information need?

Single-vector models replace this submodular structure with a linear scoring function and try to compensate with scale, hitting diminishing returns in model size instead of harnessing diminishing returns in the scoring function where they belong. ColBERT-Zero proved that training in the submodular objective from scratch, not as an afterthought, is what unlocks the full ceiling: public data beating proprietary data when the training paradigm is right.

LightOn built the infrastructure to make MaxSim production-ready: modern encoder, native multi-vector training, Rust search engines, OCR pipeline, and OriOn for deep reading after retrieval, and the result is a 149M-parameter retriever leading the hardest benchmark in the world, paired with a 32B deep reader that matches models 7× its size, all deployable on sovereign infrastructure. The math was always right. The engineering caught up.

And we are not done yet!

For more


Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-5173493923908278827
Extensions
LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR
AILLM
Show full content
The results are simply impressive.





LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR by Said Taghadouini, Adrien Cavaillès, Baptiste Aubertin
We present \textbf{LightOnOCR-2-1B}, a 1B-parameter end-to-end multilingual vision--language model that converts document images (e.g., PDFs) into clean, naturally ordered text without brittle OCR pipelines. Trained on a large-scale, high-quality distillation mix with strong coverage of scans, French documents, and scientific PDFs, LightOnOCR-2 achieves state-of-the-art results on OlmOCR-Bench while being 9\times smaller and substantially faster than prior best-performing models. We further extend the output format to predict normalized bounding boxes for embedded images, introducing localization during pretraining via a resume strategy and refining it with RLVR using IoU-based rewards. Finally, we improve robustness with checkpoint averaging and task-arithmetic merging. We release model checkpoints under Apache 2.0, and publicly release the dataset and \textbf{LightOnOCR-bbox-bench} evaluation under their respective licenses.
More information on LightOn's blog.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-2470774401533515172
Extensions
Science Discovery: The Advanced Matrix Factorization and Decomposition Jungle Page
CSHighlyTechnicalReferencePageMFML
Show full content
The Advanced Matrix Factorization and Decomposition Jungle page has a new home. It is at:
https://igorcarron.github.io/welcome-to-the-matrix-factorization-jungle/
What is new ? Not much, under the pressure of getting LLMs to be either faster or more specialized LoRAs are center stage with a multitudes of approaches. Oh, and an agent helps in building the page.
Know of a decomposition or factorization technique or an implementation or a phase transition that's missing? Open an issue and mention the paper reference (arxiv, bioxiv, DOI, techrxiv, etc....) where you found it, a brief description of the factorization or decomposition (a new one or one that has already been identified in the page), and ideally a link to code/repo. If you have identified a phase transition, please mention the article and the figure in which it is viewable (as a bound or as a graph).














Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-6477412626350859589
Extensions
A Paradigm Shift: Reasoning at Enteprise Scale
LLMML
Show full content

When performing retrieval at scale on large sets of enteprise documents, it becomes very clear that current Retrieval Augmented Generation (RAG)-like approaches are not well suited (irrespective to the context windows becoming very large). The "RAG is dead" meme that comes out every so often, willfully ignores that 
  • most interesting sets of documents are always beyond the latest largest context window that the cool kids talk about
  • the reason we want a satisfying RAG is that we do not want to choose the documents that will come into the context window
  • the current story is about text, get ready for images, voice and videos
  • large context windows do not assure a level of recall quality
If company documents are the context needed to have a purposeful discussion with LLMs inside a company or if new services or products are built on internal documents, then we need to have new algorithms for an enriched experience with all the company knowledge.

At LightOn, we believe the future of AI retrieval lies in reasoning, not just pattern matching. As Antoine Chaffin explained in his Maven podcast appearance, single-vector embeddings collapse nuance into one dimension, limiting systems to shallow similarity. (Before you read the rest of the blog post, do not hesitate to get in touch if you want to help in building this new stack)

Late-interaction models take a different approach:

  • Every token is preserved as its own vector.
  • Matching happens late, at the interaction stage.
  • The result: deeper semantic understanding and genuine reasoning.

This simple but powerful insight has sparked an open-source ecosystem that’s now shaping both academic research and production-scale AI systems.

PyLate: From Experimental Code to Peer-Reviewed Paper

PyLate began as an internal experiment to simplify multi-vector training. Today, it’s a full-fledged library with 527 GitHub stars and growing adoption.

  • Academic recognition: PyLate’s paper was accepted at CIKM 2025 (see below), becoming the first peer-reviewed library dedicated to training ColBERT-style models.
  • Practical impact: Researchers can train a state-of-art retrieval model on MS MARCO in under 2 hours with just ~80 lines of code.
  • Real-world benefit: Out-of-domain search, reasoning-heavy tasks, and long-context retrieval become accessible to any team.

if you want to learn more about the library: PyLate documentation

ModernBERT: Re-Imagining the Encoder

In partnership with Answer.AI, LightOn co-developed ModernBERT, a model that fundamentally rethinks encoder architecture.

  • 8192-token context with Flash Attention, running efficiently on consumer GPUs.
  • 1,500 GitHub stars and 27M+ downloads on HuggingFace.
  • Poster presentation at ACL 2025 (Vienna): validation from one of NLP’s most competitive venues.

ModernBERT has already been cited 305+ times, with variants like BioClinical ModernBERT emerging for healthcare applications.

👉 Explore: ModernBERT LightOn blog post

FastPlaid: Performance That Scales

Building great models is only half the challenge, making them work in production is the other. That’s where FastPlaid comes in.

  • A Rust + CUDA engine for multi-vector search.
  • Delivers +554% throughput improvements for multi-vector search compared to Stanford’s PLAID baseline.
  • Designed for scalability: powering recommendation engines, retrieval-augmented generation (RAG), and real-time search.

As Raphael Sourty explains, static indexes solve many use cases, but mutable indexes (new in v1.10.0) unlock real-world applications where data evolves continuously.

👉 Read more: FastPlaid LightOn blogpost

PyLate-rs: Retrieval in the Browser

Finally, to push accessibility even further, PyLate-rs compiles late-interaction inference to WebAssembly (WASM).

That means:

  • Run a state-of-the-art retriever directly in the browser.
  • Achieve 97% faster cold-start performance on CPU.
  • Remove server dependencies entirely.

This lowers the barrier for demos, education, and lightweight deployments, proving late-interaction isn’t just powerful, it’s portable.

From Theory to Production: A Movement

Taken together, these projects form a technical symphony:

  • ModernBERT provides the backbone.
  • PyLate enables fast and easy training of SOTA models.
  • FastPlaid ensures scalable search performance.
  • PyLate-rs brings inference to any environment.

The ecosystem has grown from an academic curiosity into a reasoning-first retrieval stack. With recognition at CIKM and ACL, adoption across GitHub and HuggingFace, and practical tools for real-world workflows, LightOn is helping shape the next era of AI search.





📖 Explore LightOn’s open-source ecosystem:

DatasetPre-training libraries


PyLate: Flexible Training and Retrieval for Late Interaction Models by Antoine Chaffin, Raphaël Sourty
Neural ranking has become a cornerstone of modern information retrieval. While single vector search remains the dominant paradigm, it suffers from the shortcoming of compressing all the information into a single vector. This compression leads to notable performance degradation in out-of-domain, long-context, and reasoning-intensive retrieval tasks. Multi-vector approaches pioneered by ColBERT aim to address these limitations by preserving individual token embeddings and computing similarity via the MaxSim operator. This architecture has demonstrated superior empirical advantages, including enhanced out-of-domain generalization, long-context handling, and performance in complex retrieval scenarios. Despite these compelling empirical results and clear theoretical advantages, the practical adoption and public availability of late interaction models remain low compared to their single-vector counterparts, primarily due to a lack of accessible and modular tools for training and experimenting with such models. To bridge this gap, we introduce PyLate, a streamlined library built on top of Sentence Transformers to support multi-vector architectures natively, inheriting its efficient training, advanced logging, and automated model card generation while requiring minimal code changes to code templates users are already familiar with. By offering multi-vector-specific features such as efficient indexes, PyLate aims to accelerate research and real-world application of late interaction models, thereby unlocking their full potential in modern IR systems. Finally, PyLate has already enabled the development of state-of-the-art models, including GTE-ModernColBERT and Reason-ModernColBERT, demonstrating its practical utility for both research and production environments.

🌐 Learn more about lighton.ai
** Nuit Blanche is now on Twitter: @NuitBlog **
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-5351648449156266656
Extensions
ModernBERT: Smarter, Better, Faster and with Longer context
CSLLMML
Show full content



🎄 Just in time for the magical week 🎅: LightOn and Answer.AI just made available a new model called ModernBERT.
ModernBERT is available as a slot-in replacement for any BERT-like models, with both a base (139M params) and large (395M params) model size.
To get a sense of how important the BERT model and its derivatives are, here are some figures:
  • Out of the 1.2 million different models uploaded on HuggingFace since its inception, Google's initial BERT model is the second model most downloaded with more than 65 millions downloads last month.
  • In the first 30 most downloaded models, BERT and related models account for 325 millions downloads last month.

We hope the community likes ModernBERT and build applications that will be smarter 🧠 , better 🛰️ , faster 🚀 and with longer context 🦒 .
Here is the preprint:
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference by Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Gallagher, Raja Biswas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Griffin Adams, Jeremy Howard, Iacopo Poli
Encoder-only transformer models such as BERT offer a great performance-size tradeoff for retrieval and classification tasks with respect to larger decoder-only models. Despite being the workhorse of numerous production pipelines, there have been limited Pareto improvements to BERT since its release. In this paper, we introduce ModernBERT, bringing modern model optimizations to encoder-only models and representing a major Pareto improvement over older encoders. Trained on 2 trillion tokens with a native 8192 sequence length, ModernBERT models exhibit state-of-the-art results on a large pool of evaluations encompassing diverse classification tasks and both single and multi-vector retrieval on different domains (including code). In addition to strong downstream performance, ModernBERT is also the most speed and memory efficient encoder and is designed for inference on common GPUs.
See also
Models:

ModernBERT model was trained smoothly on Orange Business cloud ⛅ in cooperation with Hewlett Packard Enterprise.

(*) the magical weeks are generally the last two weeks of December. Marie Curie discovers Radium (Dec 21st), the Wright brothers made their first flight (Dec 17th), Brattain and H. R. Moore made a demonstration of the transistor (Dec 23rd), Charles Babbage invented the calculating machine (Dec 26th).

** Nuit Blanche is now on Twitter: @NuitBlog **
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-8779642650883269766
Extensions
Large Language Models and Transformers (Videos, Simons Institute for the Theory of Computing)
CSLIghtOnLLMML
Show full content
As some of you may know, LightOn has built a few Large Language Models, and we are now making them usable to Enterprise customers. In the meantime and on the theoretical side of things, the Simons Institute for the Theory of Computing has organized a workshop on the topic of Large Language Models and Transformers. The program is listed below, every link links to the video of the talk (that includes streaming this week).



Monday, Aug. 14, 2023Tuesday, Aug. 15, 2023
Wednesday, Aug. 16, 2023
Thursday, Aug. 17, 2023Friday, Aug. 18, 2023



** Nuit Blanche is now on Twitter: @NuitBlog ** 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-4538057760078621849
Extensions
2021, the year AI ate HPC … and more
CSLIghtOnML
Show full content
Back in 2011, Marc Andreesen announced that Software was eating the world while everyone was trying to make sense of the realities of the cloud versus brick and mortar businesses. Eight years later, Tarry Singh articulated how AI was eating software; a year before GPT-3 and Codex would give solid ground to this prediction. Fast forward two years later, we just witnessed how AI ate HPC and we believe those are the first steps towards how AI is eating Learning, Creative and Office work.
Let me explain.


At LightOn, we have been working on getting AI to be transformative for everyone. For that to happen, we used the Jean Zay French national supercomputer for two different yet somehow related reasons this past year. First, our LightOn’s Optical Processing Unit hardware was integrated into this top105 supercomputer. Even though LightOn’s hardware is analog and uses a technology currently unknown to supercomputing, there are several good reasons the future of computing will use this technology. Relatedly, in a co-design fashion, we also used the Jean Zay facility to implement and run code for the building of Large Language/Foundation Models that we believe are key to Transformative AI. In March, we trained the largest French language model ever called Auriga and made it available to everyone through our PAGnol demo.


In July, we launched the Muse API, making our language models available for business use. Initially released in private beta, Muse has quickly gained its first customers, and a public commercial version with five languages is to be released in early 2022. Some of these early customers are using this new AI to redefine SEO or the experience for website creation.

“True happiness comes from the joy of deeds well done, the zest of creating things new” Antoine de Saint-Exupéry



Eventually, a major impact of these Large Language Models trained on HPC infrastructures will be the ability for everyone to personally learn faster and for office workers worldwide to get the job done in a fashion never seen before.


If you are a start-up company or an individual starting a business around this promise, don’t hesitate to join the Muse Partnership program, and let’s start a discussion around how Muse can help you.
These models will also have the same effect in creative work and in the discovery process.
Stay tuned, the true AI revolution is really coming!

 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-4804125901373495993
Extensions
LightOn Photonic coprocessor integrated into European AI Supercomputer
AICSlifeLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **
This is history of computing in the making stuff!

Four years ago to the day, LightOn’s first Optical Processing Unit (OPU) had its first light in a Data Center showing that our technology was data center ready.
It is with immense pride and pleasure to announce that LightOn’s OPU has been installed in one of the world’s Top500 supercomputer as part of a pilot program with GENCI and IDRIS/CNRS.

The team at LightOn is immensely proud to write the future of computing in this world-first integration of a computing photonic device into an HPC infrastructure.
The press release can be found here.
Thank you GENCI and IDRIS/CNRS for making this happen!
 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-5682120039051385969
Extensions
The Akronomicon: an Extreme-Scale Leaderboard
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **

As larger models seem to be providing more context and more ability for zero-shot learning, Julien just created the Akronomicon: an Extreme-Scale Leaderboard featuring the world's largest Machine Learning Models. And yes, LightOn is on that board for the moment!
 Want to contribute? https://github.com/lightonai/akronomicon 

 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-7677986688083463237
Extensions
Virtual Workshop: Conceptual Understanding of Deep Learning (May 17th 9am-4pm PST)
ML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **

Just got an email from Rina Panigrahy

Hi Igor,
I am an algorithms researcher at Google (http://theory.stanford.edu/~rinap) and I am organizing this workshop on "Conceptual Understanding of Deep Learning" (details below). It's trying to understand the Brain/Mind as an algorithm from a mathematical/theoretical perspective. I believe that a mathematical/algorithmic approach for understanding the Mind is crucial and very much missing. I'd appreciate any help I can get with advertising this on your blog/mailing-lists/twitter.
Best,Rina

Here is the invite:

Please join us for a virtual Google workshop on “Conceptual Understanding of Deep Learning
When: May 17th 9am-4pm PST.Where: Live over Youtube,
Goal: How does the Brain/Mind (perhaps even an artificial one) work at an algorithmic level? While deep learning has produced tremendous technological strides in recent decades, there is an unsettling feeling of a lack of “conceptual” understanding of why it works and to what extent it will work in the current form. The goal of the workshop is to bring together theorists and practitioners to develop an understanding of the right algorithmic view of deep learning, characterizing the class of functions that can be learned, coming up with the right learning architecture that may (provably) learn multiple functions, concepts and remember them over time as humans do, theoretical understanding of language, logic, RL, meta learning and lifelong learning.
The speakers and panelists include Turing award winners Geoffrey Hinton, Leslie Valiant, and Godel Prize winner Christos Papadimitriou (full-details).
Panel Discussion: There will also be a panel discussion on the fundamental question of “Is there a mathematical model for the Mind?”. We will explore basic questions such as “Is there a provable algorithm that captures the essential capabilities of the mind?”, “How do we remember complex phenomena?”, “How is a knowledge graph created automatically?”, “How do we learn new concepts, function and action hierarchies over time?” and “Why do human decisions seem so interpretable?”
Twitter: #ConceptualDLWorkshop.Please help advertise on mailing-lists/blog-posts and Retweet.

Hope to see you there!Rina Panigrahy




 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-5000655369472513039
Extensions
Randomized Algorithms for Scientific Computing (RASC)
CSLIghtOnMLRandNLAsketching
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **
At LightOn, we build photonic hardware that performs random projections and it is nice to find a source of materials on the subject in one document. Here is a report comprehensively presenting how randomized algorithms are key to the future of computing:

Randomized Algorithms for Scientific Computing (RASC) by Aydin Buluc, Tamara G. Kolda, Stefan M. Wild, Mihai Anitescu, Anthony DeGennaro, John Jakeman, Chandrika Kamath, Ramakrishnan (Ramki)Kannan, Miles E. Lopes, Per-Gunnar Martinsson, Kary Myers, Jelani Nelson, Juan M. Restrepo, C. Seshadhri, Draguna Vrabie, Brendt Wohlberg, Stephen J. Wright, Chao Yang, Peter Zwart
Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-8408426105722811972
Extensions
The $1,000 GPT-3
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **

Progress usually comes from a steady technology bootstrap…until it doesn’t.

Take for instance the race for the $1,000 genome that started in the early 2000s. Initially, sequencing the human genome meant a race between the well-funded public and private sectors but more importantly, the resources for the first breakthrough ended up costing upwards of $450M. Yet despite all the economic promise of genome sequencing, had Moore’s law been applied, sequencing one full genome would still cost $100,000 today. However, once the goal became clearer to everyone, a diversity of technologies and challengers emerged. This intense competition eventually yielded a growth faster than Moore’s Law. The main takeaway is that one cannot rely on the steady progress of one specific technology alone to commoditize tools.



Figure from NIH “Facts sheets about genomics: The cost of Sequencing a Human Genome”, Dec 7th, 2020.

What does this have to do with the current state of silicon computing and the new demand for Large Language Models (LLMs)? Everything if you ask us and here is how.

Less than a year into existence, Large Language Models like GPT-3 have already spawned a new generation of startups built on the ability of the model to respond to requests for which it was not trained. More importantly for us, hardware manufacturers are positing that one or several customers will be willing to put a billion dollars on the table to train an even larger model in the coming years.

Interestingly, much like the mass industrialization in the 1930s, the good folks at OpenAI are sketching new scaling laws for the industrialization of these larger models.

The sad truth is that extrapolating their findings to the training of a 10 Trillion parameters model involves a supercomputer running continuously for two decades. The minimum capital expenditure of this adventure is estimated in the realm of several hundreds of million dollars.

Much like what happened in sequencing, while silicon improvement and architecture may achieve speedups in the following years, it is fair to say that, even with Moore’s law, no foreseeable technology can reasonably train a fully scaled-up GPT-4 and grab the economic value associated with it.



Rebooting silicon with a different physics, light, and NvNs

For a real breakthrough to occur, much like what happened in the sequencing story, different technologies need to be jointly optimized. In our case, this means performing co-design with new hardware and physics but also going rogue on full programmability.

LightOn’s photonic hardware can produce massively parallel matrix-vector multiplications with an equivalent of 2 trillion parameters “for free”: this is about one-fifth of the number of parameters needed for GPT-4. Next comes revisiting the programmability. Current LightOn’s technology keeps these weights fixed by design. Co-design means finding the algorithms for which CPUs and GPUs can perform some of the most intelligent computations and how LightOn’s massive Non-von Neumann (NvN) hardware can do the heavy lifting. We already published how we are replacing backpropagation, the workhorse of Deep Learning, with an algorithm that unleashes the full potential of our hardware in distributed training. We are also working similarly on an inference step that will take full advantage of the massive number of parameters at our disposal. This involved effort relies in a heavy part thanks to our access to ½ million GPU hours on some of France’s and Europe’s largest supercomputers.

And this is just the beginning. There is a vast untapped potential for repurposing large swaths of optical technologies directed primarily for entertainment and telecommunication into computing.

The road towards a $1,000 GPT-3

Based on the GPT-3 training cost estimates, achieving a $1,000 GPT-3 requires four orders of magnitude improvements. Much like what occurred in 2007 with the genome sequencing revolution, Moore’s law may take care of the first two orders of magnitude in the coming decade but the next two rely on an outburst of new efficient technologies — hardware and algorithms. It just so happens that GPT-3 has close to 100 layers, so achieving two orders of magnitude savings may arise faster than you can imagine. Stay tuned!

Igor Carron is the CEO and co-founder at LightOn


 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-6618722984616933274
Extensions
Computing with Light: How LightOn intends to unlock Transformative AI
CSLIghtOnML
Show full content
I gave a talk at #mathia2021 conference on March 9th, 2021 where I drew a parallel between the scaling laws that enabled industrialization in the 1920's and the new scaling laws in AI of the 2020's. AI is at its infancy and it needs to have guiding principles (as embedded in these empirical laws) and it also needs to develop new hardware. I showed how, in this context, LightOn can help unlock Transformative AI. Enjoy!


All these other presentations by Yann LeCun, Kathryn Hess, Michael Jordan, Emmanuel Candès and others can be found in this collection of videos on Vimeo. Let me note that Michael made a similar argument as mine where we think of current stage of AI at its infancy in terms of industrialization. 




 Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-9148268335129811912
Extensions
Unveiling LightOn Appliance
CSLIghtOnML
Show full content
Today is a big day at LightOn as we unveil a hardware product, the Appliance, the world's first commercially available photonic co-processor for AI and HPC
If interested pre-ordering information is here: http://lighton.ai/lighton-appliance 
We have had a few of these optical processing units in our own LightOn Cloud for the past two years and just retired one after more than 800 days working full time.  


Here is the press release
The future is now! 

Leasing starts at 1900€/month or about US$2250/month 





 Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-135472218246721484
Extensions
Video: LightOn unlocks Transformative AI
LIghtOn
Show full content
In the coming days, we'll be making another announcement but I wanted to first share a video we did recently. At LightOn, we don't build photonic computing hardware because it's fancy or cool (even though, it is cool) but because computing hardware is hitting the limits. I know what some say about Moore's law not being dead but the recent focus on Transformers and their attendant scaling laws makes it obvious that in order for more people to have access to these models, we need a new computing paradigm. Indeed not everyone can afford to spend a billion dollars in training these models. As Azeem was recently pointing out in one of his newsletters, this is how bad things will become:
The amazing thing is that we can start to compare the cost of training single AI models with the cost of building the physical fabs that make chips. TSMC’s state-of-the-art 3nm fab will run to around $20bn when it is completed in two years. A fab like this may be competitive for 5-7 years, which means it’ll need to churn out $7-8m worth of chips every day before it pays back.

And so at LightOn, we think that a combination of algorithms and (cool) hardware as the only pathway forward for computing large-scale AI. The video is right here, enjoy!






 
Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-4669823400710286896
Extensions
The Awesome Implicit Neural Representations Highly Technical Reference Page
HighlyTechnicalReferencePageML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 
Here is a new curated page on the topic of Implicit Neural Representations aptly called Awesome Implicit Neural Representations. It is curated by Vincent Sitzmann (@vincesitzmann) and has been added to the Highly Technical Reference Page:




From the page:

A curated list of resources on implicit neural representations, inspired by awesome-computer-vision. Work-in-progress.

This list does not aim to be exhaustive, as implicit neural representations are a rapidly evolving & growing research field with hundreds of papers to date.

Instead, this list aims to list papers introducing key concepts & foundations of implicit neural representations across applications. It's a great reading list if you want to get started in this area!

For most papers, there is a short summary of the most important contributions.

Disclosure: I am an author on the following papers:

What are implicit neural representations?

Implicit Neural Representations (sometimes also referred to coordinate-based representations) are a novel way to parameterize signals of all kinds. Conventional signal representations are usually discrete - for instance, images are discrete grids of pixels, audio signals are discrete samples of amplitudes, and 3D shapes are usually parameterized as grids of voxels, point clouds, or meshes. In contrast, Implicit Neural Representations parameterize a signal as a continuous function that maps the domain of the signal (i.e., a coordinate, such as a pixel coordinate for an image) to whatever is at that coordinate (for an image, an R,G,B color). Of course, these functions are usually not analytically tractable - it is impossible to "write down" the function that parameterizes a natural image as a mathematical formula. Implicit Neural Representations thus approximate that function via a neural network.

Why are they interesting?

Implicit Neural Representations have several benefits: First, they are not coupled to spatial resolution anymore, the way, for instance, an image is coupled to the number of pixels. This is because they are continuous functions! Thus, the memory required to parameterize the signal is independent of spatial resolution, and only scales with the complexity of the underyling signal. Another corollary of this is that implicit representations have "infinite resolution" - they can be sampled at arbitrary spatial resolutions.

This is immediately useful for a number of applications, such as super-resolution, or in parameterizing signals in 3D and higher dimensions, where memory requirements grow intractably fast with spatial resolution.

However, in the future, the key promise of implicit neural representations lie in algorithms that directly operate in the space of these representations. In other words: What's the "convolutional neural network" equivalent of a neural network operating on images represented by implicit representations? Questions like these offer a path towards a class of algorithms that are independent of spatial resolution!..........


h/t Shubhendu Trivedi (@_onionesque)


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-1670876390213134593
Extensions
Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 
We presented this work at the Beyond Backpropagation workshop at NeurIPS. A great conjunction between computational hardware and algorithm! 


Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment by Julien Launay, Iacopo Poli, Kilian Müller, Gustave Pariente, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan
The scaling hypothesis motivates the expansion of models past trillions of parameters as a path towards better performance. Recent significant developments, such as GPT-3, have been driven by this conjecture. However, as models scale-up, training them efficiently with backpropagation becomes difficult. Because model, pipeline, and data parallelism distribute parameters and gradients over compute nodes, communication is challenging to orchestrate: this is a bottleneck to further scaling. In this work, we argue that alternative training methods can mitigate these issues, and can inform the design of extreme-scale training hardware. Indeed, using a synaptically asymmetric method with a parallelizable backward pass, such as Direct Feedback Alignement, communication needs are drastically reduced. We present a photonic accelerator for Direct Feedback Alignment, able to compute random projections with trillions of parameters. We demonstrate our system on benchmark tasks, using both fully-connected and graph convolutional networks. Our hardware is the first architecture-agnostic photonic co-processor for training neural networks. This is a significant step towards building scalable hardware, able to go beyond backpropagation, and opening new avenues for deep learning.



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-144723434200318029
Extensions
Diffraction-unlimited imaging based on conventional optical devices
CSML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 

Aurélien sent me an email back in October and we are now in December! Time flies.
Dear Igor,

I hope things are well.
I have been following your NuitBlanche blog for quite a few years. It would thus be great for us if you consider a recent paper of ours to appear in your blog, entitled “Diffraction-unlimited imaging based on conventional optical devices”. This paper has been published in Optics Express this year and its link is: https://www.osapublishing.org/oe/abstract.cfm?uri=oe-28-8-11243
This manuscript proposes a new imaging paradigm for objects that are too far away to be illuminated or accessed, which allows them to be resolved beyond the limit of diffraction---which is thus distinct from the microscopy setting. Our concept involves an easy-to-implement acquisition procedure where a spatial light modulator (SLM) is placed some distance from a conventional optical device. After acquisition of a sequence of images for different SLM patterns, the object is reconstructed numerically. The key novelty of our acquisition approach is to ensure that the SLM modulates light before information is lost due to diffraction.
Feel free to let us know what you think, and happy to provide more information/pictures if needed. Thanks a lot for your time and consideration!

Best regards,
Aurélien Bourquard

Thank you Aurélien

 Here is the paper's abstract:



Diffraction-unlimited imaging based on conventional optical devices by Nicolas Ducros and Aurélien Bourquard
We propose a computational paradigm where off-the-shelf optical devices can be used to image objects in a scene well beyond their native optical resolution. By design, our approach is generic, does not require active illumination, and is applicable to several types of optical devices. It only requires the placement of a spatial light modulator some distance from the optical system. In this paper, we first introduce the acquisition strategy together with the reconstruction framework. We then conduct practical experiments with a webcam that confirm that this approach can image objects with substantially enhanced spatial resolution compared to the performance of the native optical device. We finally discuss potential applications, current limitations, and future research directions.

I note that Aurélien has also published some exciting research on Differential Imaging Forensics. His co-author Nicolas has also some interesting work on Single Pixel cameras.



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-5445401002708754747
Extensions
LightOn at #NeurIPS2020
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 

I posted the following on LightOn's Blog.



We live in interesting times!
A combination of post-Moore’s law era and the advent of very large ML models require all of us to think up new approaches to computing hardware and AI algorithms at the same time. LightOn is one of the few (20) companies in the world publishing in both AI and hardware venues to engage both communities into thinking how theories and workflows may eventually be transformed by the photonic technology we develop.
This year, thanks to the awesome Machine Learning team at LightOn, we have two accepted papers at NeurIPS, the AI flagship conference, and have five papers in its“Beyond Backpropagation” satellite workshop that will take place on Saturday. This is significant on many levels, not the least being that these papers have been nurtured and spearheaded by two Ph.D. students (Ruben Ohana and Julien Launay) who are doing their thesis as LightOn engineers.
Here is the list of the different papers accepted at NeurIPS this year that involved LightOn members:

And at the NeurIPS Beyond Backpropagation workshop taking place on Saturday, December 12:

  • Hardware Beyond Backpropagation: a Photonic Co-Processor for Direct Feedback Alignment, Julien Launay, Iacopo Poli, Kilian Muller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan
  • Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures, Julien Launay, François Boniface, Iacopo Poli, Florent Krzakala (Presenter: Julien Launay).
  • Ignorance is Bliss: Adversarial Robustness by Design through Analog Computing and Synaptic Asymmetry, Alessandro Cappelli, Ruben Ohana, Julien Launay, Iacopo Poli, Florent Krzakala (Presenter: Alessandro Cappelli). We had a blog post on this recently.
  • Align, then Select: Analysing the Learning Dynamics of Feedback Alignment, Maria Refinetti, Stéphane d’Ascoli, Ruben Ohana, Sebastian Goldt paper (Presenter: Ruben Ohana).
  • How and When does Feedback Alignment Work, Stéphane d’Ascoli, Maria Refinetti, Ruben Ohana, Sebastian Goldt. paper (Presenter: Ruben Ohana)

Some of these presentations are given in French at the “Déjeuners virtuels de NeurIPS”

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-8356505701362923811
Extensions
Weight Agnostic Neural Networks, a virtual presentation by Adam Gaier, Thursday October 15th, LightOn AI meetup #7
LIghtOnLightOnAIMeetupML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 
Ever since we started LightOn, we have been putting some emphasis on having great minds think how new algorithms are possible and how they can be enabled with our photonic chips.  We also have a regular meetup where we see how other great minds are devising new algorithms. 

Tomorrow, Thursday (October 15th) we are continuing this journey by having Adam Gaier who will talk to us about Weight Agnostic Neural Networks. The virtual meetup will start at:
  • 16:00 (UTC+2) Paris time but also 
  • 7AM PST, 
  • 10AM CST, 
  • 11PM JST. 
To have more information about connecting to the meetup, please register here: https://meetup.com/LightOn-meetup/events/273660363/


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn
Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.
Other links: Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroupAbout LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our BlogAbout myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-3548381289086139579
Extensions
As The World Turns: Implementations now on ArXiv thanks to Paper with Code
implementation
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog ** 


It's the little things. 
In the 2000s, after featuring good work on Nuit Blanche, I was usually following through by asking authors where their codes were. This is how the implementation tag was born. Some of the answers were along the lines of: "I didn't make it available because I thought it was not worthy". But what I usually responded was that, in effect, releasing one's code had a compounding effect on the community: 
"You may not think it's worthy of release, but somehow, someone somewhere needs your code for reasons you cannot fathom"

 As a result, I made a conscious choice of featuring those papers that were actively featuring their implementations. The earliest post with featured implementations was February 28th, 2007 with a blog post featuring three different implementations of reconstruction solver for compressed sensing. Yes, implementations were already available before that, but within the compressive sensing community, it was a point in time with a collective realization that releasing one's code would bring others to reuse one's work and advance the field as a result. At some point, I started making a long list of implementation available but got swamped after a while because it became, most of the time, the default behavior (a good thing).
Five years ago, Samim Winiger started GitXiv around Machine Learning papers. I was ecstatic but the site eventually stopped working. Two years ago, the Paper with code site started around the same issue and flourished. Congratulations to Robert, Ross, Marcin, Viktor, and Ludovic on starting a vibrant community around this need for listing papers with their attendant code. Two days ago, the next logical step occurred with the featuring of codes within ArXiv, a fantastic advance for Science. Woohoo!
Congratulations to RobertRossMarcinViktor, and Ludovic on making this happen! 



My next question is: When are they going to get a prize for this?

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup< br/> About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-8116161492644813666
Extensions
Photonic Computing for Massively Parallel AI is out and it is spectacular!
LIghtOnML
Show full content



It’s been a long time brewing but we just released our first white paper on Photonic Computing for Massively Parallel AI. The document features the technology we develop at LightOn, some of its use, some testimonials, and how we see the future of computing. It is downloadable here or from our website: LightOn.ai
Enjoy!


Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroupAbout LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-4775363204618606777
Extensions
Tackling Reinforcement Learning with the Aurora OPU
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **



Martin Graive did an internship at LightOn and decided to investigate how to use Random Projections in the context of Reinforcement Learning. He just wrote a blog post on the matter entitled "Tackling Reinforcement Learning with the Aurora OPU". The attendant GitHub repo is located here. Enjoy!



Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup
About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-7155587677080255056
Extensions
3-year PhD studentship in Inverse Problems and Optical Computing, LightOn, Paris, France
CSLIghtOnML
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **

Come and join us at LightOn, we have a 3-year PhD fellowship available for someone who can help us build our future photonic cores. Here is 

As part of the newly EU-funded ITN project “Post-Digital”, LightOn has an opening for a fully-funded 3 year Ph.D. studentship to join its R&D team, at the crossroads between Computer Science and Physics. 
The goal of this 3 year Ph.D. position is to theoretically, numerically, and experimentally investigate how optimization techniques can be used in the design of hybrid computing pipelines, including a number of photonic building blocks (“photonic cores”). In particular, the optimized networks will be used to solve large-scale physics-based inverse problems in science and engineering - for instance in medical imaging (e.g. ultrasound), or simulation problems. The candidate will first investigate how LigthOn’s current range of photonics co-processors can be integrated within task-specific networks. The candidate will then develop a computational framework for the optimization of electro-optical systems. Finally, optimized systems will be built and evaluated on experimental data. This project will be part of LightOn’s internal THEIA project, aiming at automating the design of hybrid computing architectures, including combinations of LightOn’s photonic cores and traditional silicon chips.
In the framework of the EU funded ITN Post-Digital network, this project involves collaborations and 3-month secondments with two research groups led by:
  • Daniel Brunner (Université Bourgogne Franche-Comté / FEMTO-ST Besançon), who will be the academic supervisor - The candidate will be registered as a Ph.D. student at UBFC.
  • Pieter Bienstman (IMEC, Leuven, Belgium).
The supervisor at LightOn will be Laurent Daudet, CTO - currently on leave from his position of professor of physics at Université de Paris.
Due to the EU funding source, please make sure you comply with the mobility and eligibility rule before applying. Application: Position to be filled no later than Sept 1st, 2020.
Send your application with a CV to jobs@lighton.io with [Post-Digital PhD] in the subject line. Shortlisted applicants will be asked to provide references. This project has received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No 860830.
For more information: https://lighton.ai/careers/

Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup
About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-6323980194649763299
Extensions
LightOn Cloud 2.0 featuring LightOn Aurora OPUs
LIghtOnrandom projectionsRandomFeatures
Show full content
** Nuit Blanche is now on Twitter: @NuitBlog **



At LightOn, we just launched LightOn Cloud 2.0 that feature several Aurora Optical Processing Unit for use by the Machine Learning Community. the blog post about this can be found here. You can request access to the Cloud at https://cloud.lighton.ai/
We are also having a LightOn Cloud for Research program: https://cloud.lighton.ai/lighton-research/




Follow @NuitBlog or join the CompressiveSensing Reddit, the Facebook page, the Compressive Sensing group on LinkedIn  or the Advanced Matrix Factorization group on LinkedIn

Liked this entry ? subscribe to Nuit Blanche's feed, there's more where that came from. You can also subscribe to Nuit Blanche by Email.

Other links:
Paris Machine LearningMeetup.com||@Archives||LinkedIn||Facebook|| @ParisMLGroup
About LightOnNewsletter ||@LightOnIO|| on LinkedIn || on CrunchBase || our Blog
About myselfLightOn || Google Scholar || LinkedIn ||@IgorCarron ||Homepage||ArXiv
tag:blogger.com,1999:blog-6141980.post-3762528155001507201
Extensions