publications | Georgy Noarov

Below is a list of my journal and conference publications and preprints in reverse chronological order. You can also check out my Google Scholar profile.

2025

Foundations of Top-\(k\) Decoding For Language Models

Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, and Edgar Dobriban

Manuscript 2025

Abstract arXiv

Top-\(k\) decoding is a widely used method for sampling from LLMs: at each token, only the largest \(k\) next-token probabilities are kept, and the next token is sampled after re-normalizing them to sum to unity. Top-\(k\) and other sampling methods are motivated by the intuition that true next-token distributions are sparse, and the noisy LLM probabilities need to be truncated. However, to our knowledge, a precise theoretical motivation for the use of top-\(k\) decoding is missing. In this work, we develop a theoretical framework that both explains and generalizes top-\(k\) decoding. We view decoding at a fixed token as the recovery of a sparse probability distribution. We consider Bregman decoders obtained by minimizing a separable Bregman divergence (for both the primal and dual cases) with a sparsity-inducing \(\ell_0\)-regularization. Despite the combinatorial nature of the objective, we show how to optimize it efficiently for a large class of divergences. We show that the optimal decoding strategies are greedy, and further that the loss function is discretely convex in \(k\), so that binary search provably and efficiently finds the optimal \(k\). We show that top-\(k\) decoding arises as a special case for the KL divergence, and identify new decoding strategies that have distinct behaviors (e.g., non-linearly up-weighting larger probabilities after re-normalization).
Stronger Neyman Regret Guarantees for Adaptive Experimental Design

Georgy Noarov, Riccardo Fogliato, Martin Bertran, and Aaron Roth

42nd International Conference on Machine Learning (ICML) 2025

Selected as a Spotlight Presentation

Abstract arXiv Code

We study the design of adaptive, sequential experiments for unbiased average treatment effect (ATE) estimation in the design-based potential outcomes setting. Our goal is to develop adaptive designs offering sublinear Neyman regret , meaning their efficiency must approach that of the hindsight-optimal nonadaptive design. Recent work of Dai et al. [2023] introduced ClipOGD, the first method achieving \(\widetilde{O}(\sqrt{T})\) expected Neyman regret under mild conditions. In this work, we propose adaptive designs with substantially stronger Neyman regret guarantees. In particular, we modify ClipOGD to obtain anytime \(\widetilde{O}(\log T)\) Neyman regret under natural boundedness assumptions. Further, in the setting where experimental units have pre-treatment covariates, we introduce and study a class of contextual ``multigroup'' Neyman regret guarantees: Given any set of possibly overlapping groups based on the covariates, the adaptive design outperforms each group's best non-adaptive designs. In particular, we develop a contextual adaptive design with \(\widetilde{O}(\sqrt{T})\) anytime multigroup Neyman regret. We empirically validate the proposed designs through an array of experiments.
High-Dimensional Prediction for Sequential Decision Making

Georgy Noarov, Ramya Ramalingam, Aaron Roth, and Stephan Xie

42nd International Conference on Machine Learning (ICML) 2025

Selected as a Spotlight Presentation

Abstract arXiv Bib

We give an efficient algorithm for producing multi-dimensional forecasts in an online adversarial environment that have low bias subject to any polynomial number of conditioning events, that can depend both on external context and on our predictions themselves. We demonstrate the use of this algorithm with several applications. We show how to make predictions that can be transparently consumed by any polynomial number of downstream decision makers with different utility functions, guaranteeing them diminishing swap regret at optimal rates. We also give the first efficient algorithms for guaranteeing diminishing conditional regret in online combinatorial optimization problems for an arbitrary polynomial number of conditioning events -- i.e. on an arbitrary number of intersecting subsequences determined both by context and our own predictions. Finally, we give the first efficient algorithm for online multicalibration with \(O(T^{2/3})\) rates in the ECE metric.
@article{noarov2023high, bibtex_show = {true}, title = {High-Dimensional Prediction for Sequential Decision Making}, author = {Noarov, Georgy and Ramalingam, Ramya and Roth, Aaron and Xie, Stephan}, arxiv = {2310.17651}, journal = {42nd International Conference on Machine Learning (ICML)}, prize = {Selected as a Spotlight Presentation}, year = {2025} }

2023

The Scope of Multicalibration: Characterizing Multicalibration via Property Elicitation

Georgy Noarov, and Aaron Roth

40th International Conference on Machine Learning (ICML) 2023

Abstract arXiv Bib

We make a connection between multicalibration and property elicitation and show that (under mild technical conditions) it is possible to produce a multicalibrated predictor for a continuous scalar distributional property \(\Gamma\) if and only if \(\Gamma\) is elicitable. On the negative side, we show that for non-elicitable continuous properties there exist simple data distributions on which even the true distributional predictor is not calibrated. On the positive side, for elicitable \(\Gamma\), we give simple canonical algorithms for the batch and the online adversarial setting, that learn a \(\Gamma\)-multicalibrated predictor. This generalizes past work on multicalibrated means and quantiles, and in fact strengthens existing online quantile multicalibration results. To further counter-weigh our negative result, we show that if a property \(\Gamma_1\) is not elicitable by itself, but is elicitable conditionally on another elicitable property \(\Gamma_0\), then there is a canonical algorithm that jointly multicalibrates \(\Gamma_1\) and \(\Gamma_0\); this generalizes past work on mean-moment multicalibration. Finally, as applications of our theory, we provide novel algorithmic and impossibility results for fair (multicalibrated) risk assessment.
@article{Noarov2023ScopeMC, bibtex_show = {true}, title = {The Scope of Multicalibration: Characterizing Multicalibration via Property Elicitation}, author = {Noarov, Georgy and Roth, Aaron}, arxiv = {2302.08507}, journal = {40th International Conference on Machine Learning (ICML)}, year = {2023} }
Batch Multivalid Conformal Prediction

Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth

11th International Conference on Learning Representations (ICLR) 2023

Abstract arXiv Bib Code

We develop fast distribution-free conformal prediction algorithms for obtaining multivalid coverage on exchangeable data in the batch setting. Multivalid coverage guarantees are stronger than marginal coverage guarantees in two ways: (1) They hold even conditional on group membership -- that is, the target coverage level \(1-\alpha\) holds conditionally on membership in each of an arbitrary (potentially intersecting) group in a finite collection \(G\) of regions in the feature space. (2) They hold even conditional on the value of the threshold used to produce the prediction set on a given example. In fact multivalid coverage guarantees hold even when conditioning on group membership and threshold value simultaneously. We give two algorithms: both take as input an arbitrary non-conformity score and an arbitrary collection of possibly intersecting groups \(G\), and then can equip arbitrary black-box predictors with prediction sets. Our first algorithm (BatchGCP) is a direct extension of quantile regression, needs to solve only a single convex minimization problem, and produces an estimator which has group-conditional guarantees for each group in \(G\). Our second algorithm (BatchMVP) is iterative, and gives the full guarantees of multivalid conformal prediction: prediction sets that are valid conditionally both on group membership and non-conformity threshold. We evaluate the performance of both of our algorithms in an extensive set of experiments.
@article{Jung2022BatchMC, bibtex_show = {true}, title = {Batch Multivalid Conformal Prediction}, author = {Jung, Christopher and Noarov, Georgy and Ramalingam, Ramya and Roth, Aaron}, arxiv = {2209.15145}, journal = {11th International Conference on Learning Representations (ICLR)}, year = {2023}, code = {https://github.com/ProgBelarus/BatchMultivalidConformal}, preview = {conformal.png} }

2022

Practical Adversarial Multivalid Conformal Prediction

Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam, and Aaron Roth

36th Conference on Neural Information Processing Systems (NeurIPS) 2022

Selected as an Oral Presentation

Abstract arXiv Bib Code Slides Talk Video

We give a simple, generic conformal prediction method for sequential prediction that achieves target empirical coverage guarantees against adversarially chosen data. It is computationally lightweight -- comparable to split conformal prediction -- but does not require having a held-out validation set, and so all data can be used for training models from which to derive a conformal score. It gives stronger than marginal coverage guarantees in two ways. First, it gives threshold calibrated prediction sets that have correct empirical coverage even conditional on the threshold used to form the prediction set from the conformal score. Second, the user can specify an arbitrary collection of subsets of the feature space -- possibly intersecting -- and the coverage guarantees also hold conditional on membership in each of these subsets. We call our algorithm MVP, short for MultiValid Prediction. We give both theory and an extensive set of empirical evaluations.
@article{Bastani2022PracticalAM, selected = {true}, bibtex_show = {true}, title = {Practical Adversarial Multivalid Conformal Prediction}, author = {Bastani, Osbert and Gupta, Varun and Jung, Christopher and Noarov, Georgy and Ramalingam, Ramya and Roth, Aaron}, video = {https://slideslive.com/38991952}, arxiv = {2206.01067}, journal = {36th Conference on Neural Information Processing Systems (NeurIPS)}, year = {2022}, code = {https://github.com/ProgBelarus/MultiValidPrediction}, preview = {conformal.png}, slides = {MVP_Slides_Conf.pdf}, prize = {Selected as an Oral Presentation} }
Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications

Daniel Lee, Georgy Noarov, Mallesh M. Pai, and Aaron Roth

36th Conference on Neural Information Processing Systems (NeurIPS) 2022

Selected as an Oral Presentation

Abstract arXiv Bib Talk Video

We introduce a simple but general online learning framework, in which at every round, an adaptive adversary introduces a new game, consisting of an action space for the learner, an action space for the adversary, and a vector valued objective function that is convex-concave in every coordinate. The learner and the adversary then play in this game. The learner's goal is to play so as to minimize the maximum coordinate of the cumulative vector-valued loss. The resulting one-shot game is not convex-concave, and so the minimax theorem does not apply. Nevertheless, we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret. We demonstrate the power of our simple framework by using it to derive optimal bounds and algorithms across a variety of domains. This includes no regret learning: we can recover optimal algorithms and bounds for minimizing external regret, internal regret, adaptive regret, multigroup regret, subsequence regret, and a notion of regret in the sleeping experts setting. Next, we use it to derive a variant of Blackwell's Approachability Theorem, which we term "Fast Polytope Approachability". Finally, we are able to recover recently derived algorithms and bounds for online adversarial multicalibration and related notions (mean-conditioned moment multicalibration, and prediction interval multivalidity).
@article{Noarov2021OnlineMM, preview = {multiobjective.png}, selected = {true}, bibtex_show = {true}, video = {https://www.youtube.com/watch?v=ZiB5Y88SawA}, title = {Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications}, author = {Lee, Daniel and Noarov, Georgy and Pai, Mallesh M. and Roth, Aaron}, journal = {36th Conference on Neural Information Processing Systems (NeurIPS)}, year = {2022}, arxiv = {2108.03837}, prize = {Selected as an Oral Presentation} }
Online Multivalid Learning: Means, Moments, and Prediction Intervals

Varun Gupta, Christopher Jung, Georgy Noarov, Mallesh M. Pai, and Aaron Roth

13th Conference on Innovations in Theoretical Computer Science (ITCS) 2022

Abstract arXiv Bib Talk Video

We present a general, efficient technique for providing contextual predictions that are ``multivalid'' in various senses, against an online sequence of adversarially chosen examples \( (x,y)\). This means that the resulting estimates correctly predict various statistics of the labels y not just marginally -- as averaged over the sequence of examples -- but also conditionally on \(x \in g\) for any \(g\) belonging to an arbitrary intersecting collection of groups \(G\). We provide three instantiations of this framework. The first is mean prediction, which corresponds to an online algorithm satisfying the notion of multicalibration from Hebert-Johnson et al. The second is variance and higher moment prediction, which corresponds to an online algorithm satisfying the notion of mean-conditioned moment multicalibration from Jung et al. [2021] Finally, we define a new notion of prediction interval multivalidity, and give an algorithm for finding prediction intervals which satisfy it. Because our algorithms handle adversarially chosen examples, they can equally well be used to predict statistics of the residuals of arbitrary point prediction methods, giving rise to very general techniques for quantifying the uncertainty of predictions of black box algorithms, even in an online adversarial setting. When instantiated for prediction intervals, this solves a similar problem as conformal prediction, but in an adversarial environment and with multivalidity guarantees stronger than simple marginal coverage guarantees.
@article{Gupta2022OnlineML, preview = {multiobjective.png}, selected = {true}, bibtex_show = {true}, title = {Online Multivalid Learning: Means, Moments, and Prediction Intervals}, author = {Gupta, Varun and Jung, Christopher and Noarov, Georgy and Pai, Mallesh M. and Roth, Aaron}, arxiv = {2101.01739}, video = {https://www.youtube.com/watch?v=ceCm3GCOVqU}, journal = {13th Conference on Innovations in Theoretical Computer Science (ITCS)}, year = {2022} }
Computing Approximate Equilibria in Weighted Congestion Games via Best-Responses

Yiannis Giannakopoulos, Georgy Noarov, and Andreas S. Schulz

Mathematics of Operations Research 2022

Abstract arXiv Bib HTML

We present a deterministic polynomial-time algorithm for computing \( d^{d+o(d)} \)-approximate (pure) Nash equilibria in weighted congestion games with polynomial cost functions of degree at most \(d\). This is an exponential improvement of the approximation factor relative to the previously best deterministic algorithm. An appealing additional feature of the algorithm is that it only uses best-improvement steps in the actual game, as opposed to the previously best algorithms, that first had to transform the game itself. Our algorithm is an adaptation of the seminal algorithm by Caragiannis at al. [2011, 2015], but we utilize an approximate potential function directly on the original game instead of an exact one on a modified game. A critical component of our analysis, which is of independent interest, is the derivation of a novel bound of \( [d/W(d/\rho)]^{d+1} \) for the price of anarchy (PoA) of \( \rho \)-approximate equilibria in weighted congestion games, where \(W\) is the Lambert-W function. More specifically, we show that this PoA is exactly equal to \(\Phi_{d,\rho}^{d+1} \), where \( \Phi_{d,\rho} \) is the unique positive solution of the equation \( \rho (x+1)^d = x^{d+1} \). Our upper bound is derived via a smoothness-like argument, and thus holds even for mixed Nash and correlated equilibria, whereas our lower bound is simple enough to apply even to singleton congestion games.
@article{doi:10.1287/moor.2021.1144, preview = {congestion.jpg}, abbr = {Congestion Games}, show = {true}, author = {Giannakopoulos, Yiannis and Noarov, Georgy and Schulz, Andreas S.}, title = {Computing Approximate Equilibria in Weighted Congestion Games via Best-Responses}, arxiv = {1810.12806}, journal = {Mathematics of Operations Research}, volume = {47}, number = {1}, pages = {643-664}, year = {2022}, doi = {10.1287/moor.2021.1144}, url = {https://doi.org/10.1287/moor.2021.1144}, eprint = {https://doi.org/10.1287/moor.2021.1144}, bibtex_show = {true}, html = {https://pubsonline.informs.org/doi/abs/10.1287/moor.2021.1144} }

2021

Binary Scoring Rules that Incentivize Precision

Eric Neyman, Georgy Noarov, and S. Matthew Weinberg

22nd ACM Conference on Economics and Computation (EC) 2021

Abstract arXiv Bib Poster

All proper scoring rules incentivize an expert to predict accurately (report their true estimate), but not all proper scoring rules equally incentivize precision. Rather than treating the expert's belief as exogenously given, we consider a model where a rational expert can endogenously refine their belief by repeatedly paying a fixed cost, and is incentivized to do so by a proper scoring rule. Specifically, our expert aims to predict the probability that a biased coin flipped tomorrow will land heads, and can flip the coin any number of times today at a cost of \(c\) per flip. Our first main result defines an incentivization index for proper scoring rules, and proves that this index measures the expected error of the expert's estimate (where the number of flips today is chosen adaptively to maximize the predictor's expected payoff). Our second main result finds the unique scoring rule which optimizes the incentivization index over all proper scoring rules. We also consider extensions to minimizing the \(\ell\)th moment of error, and again provide an incentivization index and optimal proper scoring rule. In some cases, the resulting scoring rule is differentiable, but not infinitely differentiable. In these cases, we further prove that the optimum can be uniformly approximated by polynomial scoring rules. Finally, we compare common scoring rules via our measure, and include simulations confirming the relevance of our measure even in domains outside where it provably applies.
@article{Neyman2021BinarySR, preview = {incentives.jpg}, bibtex_show = {true}, title = {Binary Scoring Rules that Incentivize Precision}, author = {Neyman, Eric and Noarov, Georgy and Weinberg, S. Matthew}, arxiv = {2002.10669}, journal = {22nd ACM Conference on Economics and Computation (EC)}, year = {2021}, poster = {EC_Poster.pdf} }

2019

AI Education Matters: Building a Fake News Detector

Michael Guerzhoy, Lisa Zhang, and Georgy Noarov

AI Matters 2019

Bib HTML PDF

@article{Guerzhoy2019AIEM,
  preview = {fakenews.avif},
  bibtex_show = {true},
  title = {AI Education Matters: Building a Fake News Detector},
  author = {Guerzhoy, Michael and Zhang, Lisa and Noarov, Georgy},
  journal = {AI Matters},
  year = {2019},
  volume = {5},
  pages = {18-20},
  pdf = {http://sigai.acm.org/static/aimatters/5-3/AIMatters-5-3-05-Guerzhoy.pdf},
  html = {https://dl.acm.org/doi/10.1145/3362077.3362082}
}