String patterns in the doped Hubbard model

Understanding strongly correlated quantum many-body states is one of the most thought-provoking challenges in modern research. For example, the Hubbard model, describing strongly correlated electrons in solids, still contains fundamental open questions on its phase diagram. In this work we realize the Hubbard Hamiltonian and search for specific patterns within many individual images of realizations of strongly correlated ultracold fermions in an optical lattice. Upon doping a cold-atom antiferromagnet we find signatures of geometric strings, entities suggested to explain the relationship between hole motion and spin order, in both pattern-based and conventional observables. Our results demonstrate the potential for pattern recognition and more advanced computational algorithms including machine learning to provide key insights into cold-atom quantum many-body systems.

Superposition is among the key paradigms of quantum mechanics and describes quantum systems as simultaneously realizing different classical configurations. Such behavior is believed to be at the heart of phenomena in strongly correlated quantum many-body systems, whose description represents a central problem in condensed matter physics. One of the most intriguing consequences of the superposition principle is the existence of hidden order in correlated quantum systems: while every individual classical configuration is characterized, for example, by particular real-space patterns with varying orientations or positions, the process of averaging over many configurations may lead to an apparent loss of order in conventional observables. On the other hand, a set of projective measurements, each yielding a classical microscopic configuration as an outcome, may reveal the underlying patterns. One notable example of such a system is the one-dimensional (1D) Fermi-Hubbard model at strong coupling [1,2]. While two-point correlations between spins decay rapidly at finite doping, individual microscopic configurations show that dopants merely hide the magnetic correlations, which extend over a larger range [3,4]. Although direct detection of this hidden string order remains inaccessible in solids, quantum gas microscopy is naturally suited to probe it in cold-atom analogs as has been recently shown [5]. Indeed, experiments with ultracold atoms enable projective measurements and generally can provide access to such structures.
The hidden order in 1D is well understood; in contrast, the physics of the 2D Hubbard model is fundamentally more complex due to an intricate interplay between spin and charge degrees of freedom. This model is believed to capture the rich physics of high-temperature superconductivity and other phases [6][7][8] such as the strange metal, stripe, antiferromagnet (AFM), or pseudogap phase, but a unified understanding of these phenomena is still lacking. The advent of quantum simulations, and quantum gas microscopy [9] in particular, provides a new perspective beyond the framework of twopoint correlations. Individual snapshots of the quantum mechanical wavefunction can resolve quantum fluctuations and be used to search for hidden order.
Here we perform a microscopic study of the hole-doped Fermi-Hubbard model and report indications of shortrange string patterns in 2D over a wide doping range. Our measurements use ultracold fermions in an optical lattice at the lowest currently achievable temperatures, where at low doping AFM correlations extend across the system size [10]. We identify string patterns with varying orientations in individual projective measurements. We explain these signatures with a model of fluctuating geometric strings of displaced spins and obtain quantitative agreement with a theoretically calculated string-length distribution [11,12]. This theory extends earlier work [13][14][15] and establishes a relationship between the AFM parent state at half filling and the strongly correlated quantum states at finite doping, allowing us to predict the doping dependence of observables involving spin operators, such as the full counting statistics of the staggered magnetization, at all accessible doping levels. We also study the doping dependence of observables such as two-point spin and charge correlation functions. Our measurements suggest geometric strings and the associated hidden order as a potential new paradigm for strongly correlated systems, and provide a yet unexplored microscopic perspective on one of the long-standing problems in physics.  6 Li atom 6 Li atom (C) Outline of experimental observables used and theoretical models evaluated. We evaluate theories using both standard observables and a novel pattern-recognition-based approach in our snapshots of the quantum state.

CANDIDATE THEORIES FOR THE DOPED HUBBARD MODEL
In this work we study the Fermi-Hubbard model, which is defined by the Hamiltonian see Fig. 1A. The first term describes tunneling of amplitude t of spin-1/2 fermionsĉ j,σ with spin σ between the sites of a two-dimensional square lattice. The second term includes on-site interactions of strength U between opposite spins. We consider the strongly correlated regime, where U t and doubly occupied sites are energetically costly.
The Fermi-Hubbard model is well understood when the band is half filled at an average of one particle per site, see Fig. 1B. For temperatures T J, where J = 4t 2 /U is the super-exchange coupling, AFM correlations appear. Although these magnetic correlations are finite-ranged at non-zero temperatures, sufficiently cold finite-size systems can have AFM order across the entire system [10].
Much less is known about the doped Fermi-Hubbard model. However, it is understood that dopant delocalization for kinetic energy minimization competes with spin interactions in the background AFM. Experiments on the cuprates have also shown that at temperatures T J and between 10% and 20% doping, the pseudogap (PG) phase crosses over to the strange metal (SM), located above the superconducting dome [7]. The two novel metallic phases (PG and SM) defy a description in terms of conventional quasiparticles and still lack a unified theoretical understanding.
While phenomenological, numeric, and mean-field (MF) approaches have provided key insights in the past, quantum gas microscopy is naturally suited to assess microscopic theoretical approaches. For example, Anderson's resonating valence bond (RVB) picture [16] considers trial wavefunctions of free holes moving through a spin liquid comprised of singlet coverings. Here we consider one particular class of RVB wavefunctions called π-flux states. They stem from a mean-field density matrixρ =P GW e −ĤMF/k B TP GW , where k B is Boltzmann's constant,P GW is the Gutzwiller projection, andĤ MF is the quadratic Hamiltonian of itinerant fermions on a square lattice with a Peierls phase of π per plaquette [12]. Snapshots of the trial state in the Fock basis can be obtained by Monte-Carlo sampling, with temperature T as a free fit parameter [17].
A second microscopic approach we examine is the geometric-string theory [11]. We assume that the hole motion only modifies the parent AFM geometry: each hole, independent of other holes, moves by displacing spins along its trajectory by one lattice site. The original AFM quantum state remains otherwise unmodified; this is the frozen-spin approximation [18]. This theory, then, relates doped with half-filled states in that the ordering at half-filling is hidden in doped states via hole motion.
The key ingredient is the theoretical distribution function p th ( ) of the string lengths , see Fig. 1C, which we derive from microscopic considerations of quantum coherent hole motion with input parameters t, U , and T [12].
We directly assess these microscopic theoretical approaches with a quantum gas microscope, which provides projective measurements of the quantum mechanical wavefunction for the doped Hubbard model in the parity-projected Fock basis. Our experimental setup consists of a balanced two-component gas of fermionic Lithium in the lowest band of a square optical lattice, as reported previously [19], and we image one or both of the two spin states [20]. Entropy redistribution with a digital micro-mirror device enables a disk-shaped homo- (B, C) Change in string-pattern length histograms upon doping, for temperatures below and above the superexchange energy J. The histogram shows significant deviation upon doping for the cold data, but no noticeable change upon doping for the hot data. (D) Relative and absolute (inset) difference between doped and sprinkled-hole pattern-length histograms, highlighting temperature-dependent sensitivity.
(E) Regions of the phase diagram examined in B, C. The string-pattern observable has sensitivity at temperatures below J and below intermediate doping.
In B, C, D, error bars represent the standard error of the mean.
geneous system of approximately 80 sites with temperatures as low as T /J = 0.50(4) in units of J [10]. We can alter the local chemical potential and dope the system, maintaining independent temperature control [12]. We determine the doping from the single-particle occupation density and temperature from the nearest-neighbor spin correlator, both by comparing to numerics [12].

PATTERN RECOGNITION OF GEOMETRIC STRINGS
We design a pattern recognition algorithm for geometric strings which we apply to real-space snapshots where doublons and one of the two spin states have been removed, see Fig. 2A. Because geometric strings describe a relationship between doped and half-filled AFMs, we search for string-like patterns in the deviation between these images and a classical checkerboard. For each image, we take the set of sites which deviate and extract string patterns using the following rules: (1) every string pattern is a connected subset of sites forming a path without branching points, (2) each site can only be part of one string pattern, (3) longer string patterns are favored, and (4) every string pattern must have at one end a site which is detected as empty, and therefore consistent with having a hole on that site [12].
The detected string patterns, then, are classical entities obtained solely through image analysis, in contrast to geometric strings which are quantum mechanical, proposed physical constituents of doped AFMs. While the detection of string patterns with this algorithm does not constitute a direct measurement of geometric strings, in part due to the approximation of a quantum AFM by a classical checkerboard, we find that this algorithm is sensitive to hole doping. For experimental data taken at a doping δ, our pattern recognition algorithm extracts a string-pattern length distribution p δ ( ) for pattern lengths , normalized to the system size. At temperatures below 0.7J, this distribution indicates a statistically significant increase in measured string patterns as the sample is doped from half-filling to 10.0(8)%, see Fig. 2B. The length distribution appears roughly exponential and string patterns of lengths up to 12 sites are detected.
The appreciable distribution of string patterns p 0 ( ) detected at half-filling is well understood and can be reproduced through Heisenberg quantum Monte Carlo simulation [12]. In part, the quantum nature of the AFM and its nonzero temperature cause deviations from a classical checkerboard. While some of these effects are unavoidable, we can reduce fluctuations from an underlying SU(2) symmetry of the system by reducing the system size to a diameter of 7 sites and post-selecting on the staggered magnetization; however, results are robust to the exact post-selection scheme used [12]. Due to the non-zero p 0 ( ), for non-zero doping we also look at the absolute and relative differences between the patternlength histogram and a pattern-length histogram p δ s ( ) based upon p 0 ( ), see Fig. 2D. While the absolute difference does not recover the exact analytic string distribution because of imperfect detection of the pattern recognition algorithm [12], it does find a qualitatively similar distribution. Remarkably, at 10.0(8)% doping we find over 4 times as many length-8 patterns as at half filling, reflecting the large impact of holes in the spin background.
To compare the experimental results to the string model, we produce artificial images and evaluate them with our string pattern detection algorithm. These images are generated by randomly placing holes into actual experimental images taken at half filling, then randomly propagating each hole according to the analytically generated string length histogram (see Fig. 1C) and appro-priately displacing the spins along the hole's path [12]. Note that this approach fully preserves the SU(2) symmetry of the system. The resulting string-pattern length distribution for 10% doping agrees remarkably well with experimental data, especially given that the theory has no free parameters [12].
To verify whether our measured signal simply results from the introduction of holes rather than changes to the spin background, we compare with simulations where holes are artificially and randomly placed ("sprinkled") into experimental data taken at half-filling, equivalent to placing 1-site-long strings. The associated string-pattern length distribution p δ s ( ) fails to explain the experimental results, revealing the nontrivial interplay of spin and charge degrees of freedom in the 2D doped Hubbard model. We also compare to π-flux states by producing simulated images at 10% doping. Here the length distribution does agree with experiment, even though the simulated images are completely independent from experimental data, except from a fitted effective temperature.
We repeat the measurement for a sample heated prior to lattice loading to investigate temperature effects. Fig. 2C shows experimental data at half-filling as well as at 10.1(8)% doping, along with the theoretical prediction, for samples at temperatures between 1.3(1)J and 1.8(1)J. Notably, in contrast to colder temperatures there is no statistically significant deviation between the experimental data with and without hole doping; p 0.1 ( ) ≈ p 0 ( ). For these temperatures, thermal excitations cause deviations from a classical checkerboard which are so large that even at half filling, there are so many string patterns that they mask additional effects from doping. These deviations appear to set an upper bound on the density of detectable string patterns in the finite-size system, as shown in Fig. 2E. As a result, we plot the pattern length distribution for high-temperature and half-filling as a reference for the cold temperature datasets in Fig. 2B.
We examine the total number of detected string patterns for a closer study of the role of doping. We only include string patterns of lengths greater than 2 sites to avoid large contributions from quantum fluctuations such as doublon-hole pairs or spin-exchange processes [12]. In Fig. 3A we plot this string-pattern count normalized to the system size as a function of doping, at temperatures below 0.7J. For small doping, the number of string patterns increases linearly, indicating that the number of string patterns we detect is a fixed fraction of the number of free holes in our system and that string patterns are isolated from one another. In contrast, at a doping of about 16% the number of detected string patterns saturates. This is consistent with a high string density where strings begin to overlap or lie adjacent to Doped AFMs exhibit longer-length string patterns compared to heated AFMs, even when the staggered magnetization or nearest-neighbor spin correlator is equal and holes are sprinkled in to equate doping levels. (C) Total string count at 10% doping as a function of temperature, with corresponding sprinkled string count subtracted. Sensitivity to strings decreases with temperature due to decreased order in the parent AFM as seen in the sprinkled string count, inset. In A, B, error bars on the doping are calculated as in [12]. All other error bars are standard errors of the mean.
one another. In this regime, the spin order may become scrambled to the point where increasing the number of geometric strings does not significantly increase the number of string patterns detected. The string model agrees well with the experimental data; therefore, we use the slope of the string model prediction in the low doping regime to estimate a string detection efficiency for strings with lengths greater than 2 sites. We measure a slope of 1 × 10 −3 string patterns per site per percent doping, which yields a detection efficiency of 15.4% given the analytic string length distribution [12]. The continued agreement at large doping suggests that the mere increase in number of string states is sufficient to explain the experimental data. We investigate other possible causes for saturation and find that longer strings yield similar saturation values [12]. However, agreement persists between geometricstring theory and experiment in the absolute difference p δ ( ) − p δ s ( ) across a wide doping range, suggesting that longer strings may not be necessary.
The experimental string count is significantly larger than that of the sprinkled-hole simulation; nonetheless, the sprinkled hole results extend beyond the half-filling string count, suggesting some increase in detected string patterns simply due to the introduction of holes. Because of this effect, we use the sprinkled hole length distributions p δ s (l) rather than p 0 (l) for the absolute and relative differences in Figs. 2 and 3. The string count from π-flux states qualitatively agrees with experimental data, but quantitatively predicts an excess of string patterns at low doping and a deficit at high doping. The largest deviations occur at low doping, which may be related to the absence of long-range order at zero temperature in π-flux states at half-filling.
The average string-pattern length quantifies the size of the region around the hole where the spin pattern is distorted by the string, see Fig. 3B. The observed values are comparatively small, which is mostly caused by the large contributions from quantum fluctuations at half-filling. The average string-pattern length does not change dramatically with doping, consistent with independent patterns; however at larger dopings we observe a slight decrease in average length that coincides with the observed saturation in the string count. This behavior is captured by the geometric-string model for low and intermediate doping. At high doping, the theory exhibits shorter average string lengths than the experiment, which may be due to pattern detection or high-string-density effects which are not included in the theory.
We compare these results to a dataset where geometric strings are not expected to occur. This dataset consists of experimental images taken at various temperatures at half filling with sprinkled holes to match doping levels. Temperatures are chosen to match the measured staggered magnetization and obtain a dataset that captures the observed loss of AFM order [12]. Notably, the average string-pattern length reveals that this loss through heating occurs in a fundamentally different way than through doping. For all nonzero doping, the temperature-based dataset exhibits shorter average string-pattern lengths than the experimentally measured doping dataset. As doping increases, the average length monotonically decreases. We can alternatively match the nearest-neighbor spin correlator instead of the staggered magnetization, and find an even greater distinction between the doped and temperature-based datasets.
We can better understand the role of temperature in string-pattern detection by observing how the string count varies with temperature at fixed doping. For 10% doping, we plot the difference between the experiment and sprinkled-hole string counts, see Fig. 3C, with the individual values plotted in the inset. At our lowest temperatures, the difference is at its greatest. This high sensitivity is consistent with the greatest spin ordering for the parent AFM at low temperatures, accompanied by a relatively large string count from the experimental data. The difference decreases steadily with increasing temperature, predominantly due to the increase in the sprinkled-hole string count as a consequence of decreased spin ordering in the parent AFM.
String-pattern observables offer a new way to compare experimental results to microscopic theoretic predictions. We detect signatures of string patterns that change significantly when the system is doped. Further, these patterns are differentiated from those that appear when heating the system. The presence and behavior of these patterns upon changing system parameters evidences for a geometric-string model at temperatures just below the super-exchange energy across a wide range of doping values.

SPIN CORRELATIONS AND STAGGERED MAGNETIZATION
An accurate microscopic framework for the Fermi-Hubbard model should also be able to predict more conventional observables such as two-point correlation functions. To that end, we measure the sign-corrected spinspin correlation function for displacements | d| = d, averaged over all sites i in the system and all experimental realizations, whereŜ z i is the spin-S operator on site i, S = 1/2, and || d|| denotes the L 1 norm of d, by measuring charge correlations in experimental realizations with and without spin removal [20]. Due to the sign correction (−1) || d|| , positive correlator values indicate AFM ordering. Figure  4A shows the change in the nearest neighbor, diagonal next-nearest neighbor, and straight next-nearest neighbor spin correlators (C s (1), C s ( √ 2), and C s (2), respectively) as a function of doping at T 0.7J [12]. At halffilling, C s (1) is substantially larger than both C s ( √ 2) and C s (2) due to a strong admixture of spin singlets on adjacent sites [21]. As the system is doped, all correlators exhibit a reduction in magnitude as the holes frustrate AFM order. While C s (1) remains positive for all experimentally-realized doping values, both C s ( √ 2) and C s (2) exhibit a sign change around 20% doping. These features have been observed in experiment [20,22] and numerics [22], and are good benchmarks for the evaluation of theoretical models.
We make predictions for spin correlations based on ensembles of images from sprinkled holes, geometric-string theory, and π-flux states without post-selection [12]. In the vicinity of half-filling, sprinkled holes and the string model accurately describe the experimentally-measured data. As these theories make predictions based on the experimental data at half-filling, this agreement is by construction. Away from half-filling, sprinkled holes underestimate the decrease of the correlators since it fails to account for the disruption of AFM order as the system is doped. In contrast, the string model overestimates the decrease of C s (1), which could stem from backaction of the background state after string state formation. However, it explains the decrease of C s ( √ 2) and C s (2) on a quantitative level. The π-flux model performs well and accurately predicts C s (1) far from half-filling, but fails to predict the sign change of C s ( √ 2) and C s (2) at intermediate doping. In particular, the sign change of C s ( √ 2) and C s (2) is an interesting qualitative feature that is predicted by the string model. As a direct result of spins being displaced by one site when a string passes through, and C s (2). Because C s (1) reflects opposite spin alignment from C s ( √ 2) and C s (2), this mixing results in a sign change once the contribution of C s (1) exceeds that of the original correlation strength at some critical doping.
Cold atom experiments provide access to full-counting statistics (FCS) due to their ability to project and measure an entire quantum system at once [10]. We measure the FCS of the staggered magnetization operator, for system size N across all experimental realizations as we dope the system, see Fig. 4B. As expected, the staggered magnetization distribution narrows, reflecting the finite-size crossover from the AFM-ordered phase [10]. The sprinkled-hole simulation does not exhibit a major change in the distribution as the system is doped, as it fails to account for the disruptive effect of holes on the AFM order. The π-flux states predict an overly-narrow distribution close to half-filling, possibly because it fails to predict long-range order at half-filling [7]. However, it accurately predicts the staggered magnetization FCS at larger doping values where RVB-based models are expected to perform well. In contrast, the string model predicts the distribution function very closely across all dopings. Across not only newly introduced observables, but conventional observables as well, we find strongest agreement between experiment and the string model as compared to the other theoretical frameworks considered.

ANTI-MOMENT CORRELATIONS
All observables studied in this work thus far have focused on the spin sector of the Hubbard model. Now we examine correlations in the charge sector. At sufficiently low temperatures, one may expect signatures of pairing [8,23] or stripe phases [24,25], which lead to hole bunching. On the other hand, anti-correlations of the holes, as observed previously at elevated temperatures [22], are expected in the strongly correlated metallic regime of the Hubbard model. The transition between these two regimes in the Hubbard model phase diagram is not yet fully understood, however the currently accessible experimental regime allows us to place new bounds on where this transition can occur.
In our experiment, doubly-occupied sites appear as empty when imaged and the exact hole correlation is not directly accessible; rather, we measure "anti-moment" correlations C h (| d|) at a distance | d| which include contributions from doublon-doublon and doublon-hole correlations: wheren s, i is the single particle occupation on site i. At half-filling, numerics indicate positive anti-moment correlations at the percent level for nearest neighbors, dominated by positive doublon-hole correlations [22]. Doublon-hole pairs beyond nearest-neighbors become increasingly unlikely; therefore, to avoid the effects of doublon-hole pairs we focus on correlations at distances greater than 1. We find anti-moment correlators at halffilling (see Fig. 5B) to be weaker than predicted according to numerics, which may result from imperfect imaging fidelity. However, this effect only weakens the magnitude of the anti-moment correlators measured; as such in this section we focus on qualitative conclusions from the experimental data. Figure 5A shows the anti-moment correlation for 3% and 19% doping. While holes appear uncorrelated close to half-filling, at larger doping qualitatively different behavior appears. We find statistically significant antimoment anticorrelations out to distances greater than 2, reflecting hole-hole repulsion in this regime. Microscopically, such repulsive interactions can arise from the existence of a low-lying bound state of two holes [26]. Here we do not consider geometric-string theory or sprinkled holes because both introduce uncorrelated holes by construction. Additionally, we compare to π-flux states without doublon-hole pairs to avoid unintended artifacts in the anti-moment correlator. However, for reference we plot the predicted hole-hole correlation function for a phenomenological model of spinless fermionic chargons with nearest-neighbor hopping and temperatures between 0.5J and 0.7J [27]. Here strong anticorrelations result from Pauli repulsion between the fermionic chargons, but qualitatively similar behavior is expected for bosonic chargons with hard-core interactions. We find that both theories qualitatively describe the experimental result.
The emergence of this repelling behavior can be characterized by plotting the anti-moment correlation as a function of doping for d = √ 2 and d = 2, see Fig. 5B. Beyond the intermediate doping regime, negative correlations appear at distances of √ 2 (2), suggesting a growth of hole-hole repulsion with doping. Furthermore, the presence of anti-moment correlations between sites of differing sub-lattices at d = 1 evidences against holes tunneling preferentially between sites of one sub-lattice, as predicted by theories of point-like magnetic polarons with a dispersion minimum at (π/2, π/2) in the Brillouin zone [28][29][30][31].
Finally, we plot a normalized g (2) (d = √ 2) to account for the difference between doped holes and holes in doublon-hole pairs and quantify the relative fraction of doped holes that are anticorrelated: for doping δ, see Fig. 5C. This rescaling allows direct comparison to the g (2) function for theories without doublon-hole pairs. Because the number of free holes is so small for doping below 5%, no statistically significant statements can be made about the behavior of holes in this regime. In the geometric-string theory, we assume that chargons are completely uncorrelated with each other, but due to their fermionic statistics Pauli blocking should actually introduce anti-correlations which have not yet been included in our analyses. In the low doping, dilute string regime, anti-correlation strength can be estimated by point-like magnetic polarons where the known dispersion relation of the dressed hole [32] is used to define a tight-binding hopping model of the polaron. Figure 5C shows that our data is incompatible with this model, which predicts significantly weaker hole-hole anticorrelations. Similar behavior is predicted by the π-flux theory, which models the doped holes as point-like objects moving in a quantum spin liquid of singlets. An improved picture is to consider the finite extent of the magnetic polaron which results from the spinonchargon bound state predicted by geometric-string theory. Indeed, at sufficiently large doping, chargons can influence each other and their hard-core character is expected to introduce anti-correlations. In this regime, which corresponds to doping above 12% − 15% for our Hubbard parameters and temperature, geometric strings overlap substantially and give rise to a collective state of fluctuating strings. This could lead to strong modifications of the correlations because the dispersion relation of the chargons would no longer be determined by the spinons. Indeed, in this doping regime, the experimental results seem to be remarkably similar to spinless char-gons, in agreement with earlier theoretical work in the strange-metal regime [27,33].

CONCLUSIONS AND OUTLOOK
In this article, we have created and measured new observables to assess microscopic theories of the doped Hubbard model. These string-pattern-based observables leverage the power of quantum simulation to fully collapse the quantum state in a single experimental instance, and therefore represents an abstraction from and complement to established observables such as correlation functions or full counting statistics. Across all observables considered, we find good consistency between the geometric-string theory and experimental data. Moreover, the string-pattern observables suggest the presence of hidden order that is obscured by solely considering conventional correlation functions.
At intermediate doping values, we find evidence for the onset of repulsion between holes in an AFM. Of the theoretical models considered, the experimentally measured correlations agree best with a phenomenological theory of free fermionic chargons. Further investigation is required to definitively determine whether the anticorrelation up to a distance of two lattice sites is due to quantum statistics or repulsive interactions. While signatures of other phases such as stripe phases, incommensurate spin order, or nematic fluctuations have not yet been observed in this system, they are predicted to emerge at lower temperatures.
The ideas presented in this article for formulating new observables can be extended to other real-space patterns, for example patterns which reflect the underlying physics of other candidate microscopic theories for the doped Hubbard model. Moreover, machine learning techniques could be used to directly compare sets of raw experimental atom distributions to theoretical models without the need for intermediate observables. This class of techniques is highly promising as quantum simulations of the Hubbard model continue to probe lower temperatures within the pseudogap and strange metal phases, but can also be applied to spatially resolved studies of quenches across phase transitions [34], dynamical phase transitions [35], and higher-order scattering processes [36]. Possible extensions of our work include systems with anisotropic spin interactions [11] or doped SU(N) spin models [37]. * To whom correspondence and requests for materials should be addressed: greiner@physics.harvard.edu.

ATOMIC SAMPLE PREPARATION
Details about the experimental setup, including the procedures used to create the low-temperature Fermi gas and set the doping value, can be found in [S1]. The temperature of the gas is increased via the process described in [S14]. The value of U/t is calibrated as described in [S15].

DOPING DETERMINATION
In our experiment, we measure the percentage of sites occupied by single particles (singles density). We use numerical simulations to obtain the doping as a function of singles density. For data between T = 0.6J and T = 0.8J we use data obtained from a determinantal quantum Monte Carlo algorithm [S16], and for all larger temperatures we use data obtained from a numerical linked-cluster expansion algorithm [S17]. For T < 0.6J, the sign problem becomes significant. As a result, in this regime we use data at T = 0.6J, as the density sector of the equation of state is relatively insensitive to temperature here. We account for an imaging fidelity of 98.5%. When statistical fluctuations cause the singles density to exceed the numerically-obtained singles density at half-filling, we treat those samples as at half-filling.

Error analysis
When determining the standard error of doping values for each experimental dataset, we assume that the particle density is linearly dependent on singles density. We apply a linear fit to doping versus singles density from the numerical simulation mentioned above, yielding approximately δ = 1.22 × (0.905 − n s ), where δ is doping and n s is the singles density. We then calculate the standard error of the singles density and use the linear fit result to get the standard error of the mean doping value.
Since the actual doping value varies across datasets, we group datasets by their mean doping values within windows of width 2%. This yields a single mean doping valued for the entire group. The associated uncertainty ∆ is determined by assuming each dataset k within the group was taken at a different doping value d k with a corresponding uncertainty δd k . Then ∆ can be calculated as:

Details on the algorithm
A detailed schematic for the string pattern detection algorithm can be found in Fig. S1. In order to detect string patterns in our system, we first calculate the staggered magnetization and choose the classical checkerboard pattern with the same sublattice magnetization as reference for each image separately. In the next step, the postselection procedure described in section 3.7 is applied to focus on cases with Néel order along the measurement axis. In each of the images selected for further analysis, we identify all sites that deviate from the reference checkerboard pattern. A subset of these sites is called a string pattern if it is possible that a hole moving through the reference checkerboard leaves exactly this trace. Therefore, the deviating sites must be continuously connected via nearest neighbors. Apart from the two endpoints, each site in the string pattern must have at least two nearest neighbors which are part of the same pattern. Furthermore, at least one of the two endpoints must be detected as empty, thus being consistent with the hole being at this site. For each subset of connected sites that deviate from the checkerboard pattern, we detect the longest possible string pattern and subsequently trace it back to restore the antiferromagnetic (AFM) order. This procedure is repeated until no more such patterns can be found. Up to occupied sites that should be empty the perfect checkerboard is thus restored across the entire image. The described detection scheme is the conceptually simplest algorithm that is capable of identifying possible strings. Overlaps of different strings or perturbations and loops within one string are not treated correctly, because sites that do not deviate from the reference state are not taken into account. We discuss the robustness of our results with respect to different string pattern detection algorithms in section 3.6.
Signal at half-filling Already at half filling, the spin state deviates from a classical checkerboard pattern. Reasons include thermal excitations, variations in the direction of the anti-ferromagnetic (AFM) ordering vector, doublon-hole pairs and spin exchanges. Contributions from thermal excitations as well as projection noise are reduced as far as possible while keeping a sufficient number of snapshots for reduced statistical uncertainty by post-selecting on a high staggered magnetization. Regardless, doublon-hole pairs appear in the experimental images as two empty sites, which are in most cases nearest neighbors since U t. After removing one spin species, two empty neighboring sites correspond to a single site deviating from the checkerboard pattern. Additionally, the exchange of two neighboring spins in a perfect checkerboard pattern leads to a deviation of two connected sites in the difference picture. These two effects thus lead mostly to string patterns of one or two sites in our detection algorithm. In order to avoid these contributions, short string patterns with two sites or fewer are excluded from the total string counts shown in Fig. 3 of the main text and in Figs. S3-S6, unless otherwise specified.
Apart from doublon-hole pairs, the Fermi-Hubbard model at half filling for U t can be approximated by the Heisenberg model. We can therefore compare the experimental string signal at half filling to images sampled via quantum Monte Carlo calculations of the Heisenberg model. In order to simulate the same boundary and finite size effects as in the experiment, we cut out a region of interest of the same size and shape in the quantum Monte Carlo images. We furthermore convert neighboring sites with opposite spins into doublon-hole pairs with a probability given by 4t 2 /U 2 . Finally, we simulate the experimental readout and only keep information about one spin species, while doublons, holes and the other spin species appear as empty sites. In Fig. S2 we compare the string length distribution obtained with the algorithm described above for experimental half filling to quantum Monte Carlo simulations of the Heisenberg model at T = 0.6J with and without artificial doublon-hole pairs. The introduction of artificial doublon-hole pairs corrects the significant discrepancy between the QMC data and the experimental data at strings of length 1. The resulting simulated data agrees reasonably well with our experimental observations. At longer string lengths, there are statistically significant deviations between the QMC data with charge fluctuations and the measured string-pattern length distribution. This discrepancy may be due to a failure to consider the delocalization of doublon-hole pairs, which is expected at the experimental parameters. These quantum fluctuations become much rarer upon doping, where the QMC data with added strings agrees with the experimental measurements.

Detection efficiency
In Fig. 3A in the main text, the total number of detected string patterns of length greater than two sites is shown as a function of doping at temperatures below 0.7J. From the slope of a linear fit to the data for doping up to 16% we extract an increase in detected strings of 0.001 per site per percent doping. The analytically calculated string length distribution for a temperature of 0.6J predicts that 65% of string states have length greater than 2 sites, giving a predicted increase of detected string patterns of 0.0065 per site and percent doping. This results in a detection efficiency of 15.4% for the total number of detected strings with length greater than 2 sites, the string count. We now discuss several situations on which detected string patterns do not match the original string states, even with AFM order along the measurement axis. Given two neighboring empty sites at one end of a string pattern, due to the incomplete readout in the experiment it is impossible to determine which of the two sites was empty before the removal of one spin species. In our string detection algorithm, we assume that the hole causes a deviation from the classical checkerboard pattern and is therefore located on the sublattice occupied by the spin species that is not removed. However, a hole is equally likely to be on the other sublattice, where after imaging we cannot distinguish between hole and removed spin. Thus, for every string we detect, there is a 50% chance that it is actually one site longer. Specifically, strings of length 3 sites are detected as strings of length 2 half of the time. The string theory predicts 0.018 strings of length 3 per site at 10% doping. If we detect half of them as length-2 instead of length-3 objects and therefore don't count them in the total string number shown in Fig. 3A, this effect alone already explains approximately one-sixth of the missing signal.
The detected string length can also change due to overlaps of the string with another string or one of the perturbations of the checkerboard pattern discussed above. In some cases, this will result in a string of length greater than 2 sites being detected as too short for the string count. In Fig. S3A, the total string count including all pattern lengths is shown. The detection efficiency increases to about 30%, in accordance to the discussion above.
Large overlaps of a string with another string or perturbation can lead to the cancellation of both, which decreases the total number of strings found by 2. While overlaps and complete cancellations are rare events, the post-selection to a large value of the staggered magnetization favors images in which different effects cancel each other and thereby locally restore the checkerboard pattern.
If two strings lie adjacent to one another, they will very likely be detected either as one longer string, which is always favored by our algorithm, or as two strings with different lengths than the original objects. This may further decrease the total number of detected strings. In Fig. S3D, strings are simulated with only length 2 sites, however the resulting string count shows a dependence on the number of these strings. Because the string count only includes patterns of lengths greater than two sites, the patterns detected seem to arise from input strings lying adjacent to one another, or to a deviation in the checkerboard pattern; however, the simulated string count does not agree with experiment, suggesting that the simulated strings are on average of insufficient length. In Figs. S3E and F, the total string count is shown for a simulation of strings with only length 3 or 4 sites, respectively. The number of detected string patterns longer than 2 sites is close to experimental value in the case of simulated strings of length 3. However, for simulated strings of length 4, the string count is slightly higher than the experimental data, indicating a sensitivity to the shape of the distribution function. This effect becomes quite clear in the infinite-string-length limit (see Fig. S3G).

Image generation with artificial strings
Given the background signal as well as the detection efficiency discussed above, we cannot expect the measured string length distribution to match the predictions from string theory directly. We can however simulate the effect of the analytic string length distribution from the geometric string theory and compare the resulting detected distribution to the experimental result. Because this theory makes no statement about the parent AFM and instead describes the relationship between half-filling and non-zero doping, in this simulation a number of holes corresponding to the doping value under consideration are placed at random positions into the experimental images taken at half filling. For each hole, a length is sampled from the analytic string length distribution and the hole is propagated accordingly. The direction is chosen randomly at each step, but the hole cannot move backwards. After preparing a sample of artificial string images at a given doping value we apply the same postselection and string detection as for the experimental images at finite doping. The detection scheme is now common to both the artificially generated images as well as the experimental images, thus allowing for a direct comparison of the results as presented in the main text.
Simulating string length distributions at significantly lower or higher temperatures or simulating only strings of a given length changes both quantities and leads to a worse agreement with the experimental result, see Fig. S3B. In the simulation, we can also change the participation ratio of holes in strings. As before, we place the holes at random positions into the system. Only the holes participating in strings are propagated according to the chosen string length distribution. The remaining holes are not moved and therefore constitute strings of length zero. Applying the same post-selection and detection procedure as before, the string count is lower than in the experimental images at finite doping, see Fig. S3C. We therefore conclude that under the geometric string picture, for consistency with experimental results all free holes partake in strings.

Average string length
The average measured string length l(δ) in Fig. 3B of the main text is calculated from the string histograms l = l l · p δ (l)/ l p δ (l).
Error bars are obtained via standard error propagation of the uncertainties in the measured string length probabilities.
To illustrate the importance of the string patterns we compare the observed average string pattern length dependence on doping to a simpler model where geometric strings are absent: starting from the experimentally recorded pictures at half-filling for various temperatures, we randomly place holes up to a given level of doping. Running the string pattern detection algorithm on these pictures then allows us to obtain a prediction for the average string pattern length at that doping level (in the absence of geometric strings). We obtain this prediction for different temperatures, which we parameterize by the staggered magnetization m z . In a final step we interpolate the obtained function l(m z ) with a linear fit and chose an 'effective' temperature by matching the staggered magnetization to the actually measured value in the experiment at that doping. As the dependence of the measured staggered magnetization on doping is non-linear, we perform a piecewise linear fit of the closest five data points for each doping value to obtain a reliable estimate of m z . This entire procedure allows us to directly compare the experimental data at finite doping to a scenario where the same number of spins are flipped on average and the same number of doping holes are present -but no geometric strings are included. The error bar for the predicted average string length is obtained by combining the measurement error of m z with the error of the linear fits weighted by the standard deviation of the measured quantities.
As a cross-verification, we have applied the same procedure except for choosing the 'effective' temperature by matching the value of the nearest-neighbor spin correlator. With this method we found the same qualitative behavior.

Temperature dependence of pattern detection
An increase of the temperature leads to a higher number of detected strings already at half filling, see main text Fig. 2, since thermal fluctuations lead to greater deviation from a classical checkerboard pattern. The faithful detection of string patterns due to doping is however only possible if the half-filling signal is sufficiently small. As shown in Fig. 3 in the main text, a significant difference in the string count between experimentally doped data and half filling data with holes sprinkled in is only visible up to T = J.
While the string length distribution predicted by analytic calculations is dominated by strings of length 0 and 1 for temperatures smaller than 0.5J, it continually broadens and longer strings are more likely to appear for increasing temperature, see Fig. S4A. However, from main text Fig. 3C it appears that effects of temperature on the parent AFM are dominant over changes in the analytic string length distribution.
When perturbations of the checkerboard pattern and strings start to overlap, the sum of detected strings saturates to a value of approximately 0.039 strings per site. In Fig. S4B we plot the string count as a function of temperature for different doping values. The temperature at which saturation occurs gets lower with increased doping, consistent with the string picture where every free hole creates a string and brings the total string count closer to frustration.

Robustness under other string detection algorithms
There are arguably many possible algorithms which can be used to quantify the presence of string patterns. Here we discuss two alternate algorithms based on the string model. We find that these algorithms are comparable in performance to the detection algorithm discussed in the main text, and determine our algorithm of choice based on simplicity.

Simplified difference method
The most straightforward way to detect string patterns is to simply count the continuously connected sites that deviate from the classical checkerboard pattern. As opposed to the algorithm we use in the main text, not every object identified as a string pattern in this way can actually be a geometric string. For example, it is possible that both endpoints as well as the sites surrounding them are occupied such that there cannot be a hole at either end. Moreover, the shape of the object may not be consistent with a non-branching string pattern. However, one can argue that these inaccuracies mainly occur at high temperature or high doping values when perturbations and strings start to overlap.
In Fig. S5A, the same quantities as plotted in main text Figs. 2B and 3A are shown, but under the simplified string detection algorithm instead. The slope of the total number of detected strings as a function of doping remains roughly the same as before, only the offset at half filling is slightly higher.

Happiness method
In the dilute string regime, where string states do not overlap or lie adjacent to one another, one can search for string patterns by also requiring that sites immediately surrounding the string maintain AFM order. This method is also susceptible to identifying string patterns caused by doublon-hole pairs, spin-exchange processes, and projective measurement. However, as these effects will introduce deviations from AFM order, this is perhaps the most conservative approach to finding string patterns.
This algorithm characterizes nearby order by labeling each site with the number of anti-aligned bonds it has with its nearest neighbors, termed the "happiness" of that site, for images with one spin species removed. For example, sites in a classical AFM would all be labelled with happiness 4, while a ferromagnet would have sites with happiness 0. As a hole moves through an AFM, sites which previously had happiness 4 will exhibit reduced happiness. Depending on the length of the string, sites within a string can be characterized by specific string patterns. Based on this, the algorithm takes images with one spin state removed and for each image, begins by storing all sites which could be the beginning of a string. For each candidate string beginning, it sees if there is a neighboring site which could be the next site in the string, given the happiness and spin occupation of that site. This process continues until the string cannot be propagated any further, at which point the algorithm searches for a neighboring site which could be the end of the string. Figure S5B shows the same quantities plotted in main text Figs. 2B and 3A, but under the happiness string detection algorithm instead. While the qualitative dependence of these observables on doping under this algorithm is comparable to that of the algorithm used in the main text, it is clear that the signal to noise ratio is significantly lower. This is not surprising, especially given that quantum fluctuations and projection noise do contribute considerably to measurement and reduce the sensitivity of string patterns to string states. However, the signal to noise here is too low to make a statement about the agreement between any of the theories considered and the experiment. Regardless, this algorithm also finds a saturation in the string count at a doping of approximately 15%, providing additional evidence that the saturation reflects a physical change in the system which is independent of the particular detection algorithm.

Robustness under different post-selection procedures
The ensemble of experimental measurements reflect the SU(2) symmetry of the underlying Hubbard Hamiltonian. We therefore post-select on the value of the staggered magnetization to favor measurements where the measurement axis is more closely aligned with the staggered spin-ordering vector. Due to the finite temperature, there is no longrange AFM order across the entire system, but only in areas spanning the size of the correlation length. For every image, we consider all possible positions of a window with the corresponding size of 7 sites across in the region of interest and choose the position with the highest absolute value of the staggered magnetization, see Fig. S1. From the resulting images, we post-select on the 60% of images with the highest absolute value of the staggered magnetization and from these images search for string patterns. By using a relative staggered magnetization cutoff rather than an absolute one, we avoid completely post-selecting out images which have string patterns, because these patterns necessarily reduce the staggered magnetization.
The size of the window used in the first step of the post-selection procedure is chosen according to the correlation length at half filling. Varying the window size to a smaller or larger area does not change the qualitative result, see Fig. S6A.
As described above, the window is moved to the position within the region of interest which gives the highest possible value of the staggered magnetization. While the necessity to consider only a part of the actual image is due to the correlation length which is smaller than the system size, it is not essential to move the window to the optimal position. Instead, it could always be placed at the same position, for example in the center of the region of interest. However, this requires a stronger post-selection on the staggered magnetization afterwards -likely because of poorer alignment with the staggered spin ordering vector -and therefore reduces the number of images in which to search for string patterns. In Fig. S6B, the result for a fixed window position is shown to be qualitatively very similar to the results shown in the main text. The error bars increase significantly due to the smaller number of analyzed snapshots.
Finally, the fixed fraction of images kept in the post-selection process can be varied. We choose this value in an effort to capture the tail of the histogram of the staggered magnetization. Changing the percentage of images in which we look for string patterns changes the string count at half filling accordingly. However, the slope as a function of doping remains the same as can be seen in Fig. S6C.

Phenomenological models
While the results of the main text indicate that the geometric string model predicts all observables considered with significantly greater success than the π-flux theory or simulations with sprinkled holes, it is also constructive to assess how well basic phenomenological models perform. Here we consider two.

Matching spin correlations
Here we begin with a random but balanced spin distribution. From this ensemble, we randomly place the desired number of doublon-hole pairs and holes according to the desired doping value. Finally, we flip spins randomly until the correlators C s (1) and C s ( √ 2) agree with the experimental data. From this dataset, we apply our string pattern detection algorithm to compare with experimental result. The region of interest of the dataset matches that of the experiment. We generate images corresponding to half-filling and to 10% doping in experiment.
Figures S7A-C shows the measured string pattern length distribution, spin correlation function, and full counting statistics of the staggered magnetization for the generated images in comparison to experimental result. Because we begin with spin distributions with no correlations and artificially introduce nearest-neighbor and diagonal nextnearest neighbor correlations, it is not surprising that the correlation functions do not agree beyond short distances. In turn, because the spin correlation function is closely related to the average staggered magnetization, it is not surprising that the staggered magnetization distribution also does not agree and that the average value is lower for the generated data.
However, the string pattern length distributions do not match either. While there is agreement at short string pattern lengths, the generated images contain statistically significantly more long patterns than in the experiment, both for half-filling and 10% doping. Surprisingly, it seems that matching the first two correlators is insufficient to introduce the order needed to prevent long string patterns. Modifications to the phenomenological model such as beginning from a perfect checkerboard pattern with SU(2) symmetry do not increase the level of agreement. Furthermore, the contributions of making C s ( √ 2) match experiment (in addition to C s (1)) do not affect the results significantly.

Corrections to a classical AFM
We also apply a phenomenological approach where we begin with a classical checkerboard, create singlets with some variable density, and place doublon-hole pairs and holes randomly according to the desired doping value. We finally apply a projective measurement process and, ensuring that the region of interest is the same as in experiment, run the string search algorithm on the result. Figures S7D-F show that while the density of singlets can be varied to achieve reasonable agreement for the spin correlation function and staggered magnetization full counting statistics at 10% doping, this agreement does not translate into the same level of agreement for the string pattern length distribution. Again there appears to be an excess of long patterns, as seen in section 3.8.1 from matching C s (1) and C s ( √ 2). Furthermore, it is clear that keeping the same density of singlets for half-filling results in stark disagreement across all observables.

Comparison to full resolution
In images taken in our experiment, we cannot distinguish between holes, doublons and the removed spin species. As a consequence, we find string patterns at half filling where theoretically there cannot be any strings and furthermore, at finite doping the increase in the total string count corresponds to only about one-fifth of the number of holes. In a system with full resolution where the hole positions are known, the number of found strings according to our algorithm has to correspond to the number of holes. However, the detection of the distribution of string lengths can still be modified by overlaps between string patterns in the same way as in our experiment. We can simulate full spin readout starting from quantum Monte Carlo simulations of the Heisenberg model and moving randomly placed holes through the spin background according to a given string length distribution. In the case of full resolution, we slightly modify our algorithm and accept strings only if they have a hole at one end. In Fig. S8 a comparison of the detected string length distribution with and without full resolution is shown. While, as expected, the signal with full resolution is a factor of about five higher, the relative distribution of the detected string-pattern lengths remains the same.

Geometric-string theory
To describe the effect that hole doping has on the AFM at half filling, we neglect correlations between dopants and consider the case of a single hole first. Our starting point is the undoped Heisenberg spin model at half filling, which we describe by a thermal density matrix ρ 1/2 = e −βĤH /Z 1/2 , where β = 1/k B T with the Boltzmann constant k B and temperature T , Z 1/2 is for normalization andĤ H = −J i, j ˆ S i ·ˆ S j denotes the Heisenberg Hamiltonian with coupling J between spinsˆ S on neighboring sites i, j of a square lattice. When modeling the correlations between the mobile hole and the surrounding spins, we apply the frozen-spin approximation introduced at zero temperature in Refs. [S2, S6]. To describe the motion of the hole, we introduce an approximate basis generated by string states. For example, the trivial string state | j, σ, 0 with length = 0 and spin σ is obtained by annihilating a fermion with spin σ at some lattice site j, i.e. | j, σ, 0 =ĉ j,σ |Ψ 1/2 , where |Ψ 1/2 denotes any typical undoped state from the ensemble described by ρ 1/2 . Non-trivial strings Σ, defined as finite trajectories on a square lattice without self-retracing components, correspond to sites on a fractal Bethe lattice, or a Cayley tree, with coordination number z = 4. Every such string labels a separate approximate basis state, | j, σ, Σ =Ĝ Σ | j, σ, 0 ; the string operatorĜ Σ starts from the original position j of the hole and moves it along the trajectory described by Σ, while displacing all spins along the way accordingly.
If |Ψ 1/2 is the classical Néel state, the string states | j, σ, Σ form an orthonormal basis, except for certain loop configurations which have been identified first by Trugman [S18] and lead to double counting of some states. As shown in Ref. [S2] however, one may assume that all states | j, σ, Σ are mutually orthonormal; the dominant effect of Trugman loops can be captured by adding corrections to the hole dispersion. If |Ψ 1/2 describes the ground state of the quantum Heisenberg AFM, the approximation that all states | j, σ, Σ are mutually orthonormal still holds; using exact diagonalization in a 4 × 4-site system with periodic boundary conditions we verify that state overlaps remain 1 except for Trugman loop configurations. In fact, for any state with strong local AFM correlations, we expect that this approximation is valid because the motion of the hole imprints a significant memory of its trajectory in the surrounding spin environment. This is found to be true even in a completely disordered spin environment at infinite temperature [S19], at least on a qualitative level. Because all typical states |Ψ 1/2 from the ensemble described by ρ 1/2 have significant local AFM order, we will assume in the following that the set of states | j, σ, Σ forms an orthonormal basis which defines the effective Hilbert space of the geometric string theory.
Next we derive the effective Hamiltonian. For simplicity, we consider the t−J HamiltonianĤ t−J =Ĥ t +Ĥ J which provides an approximate low-energy description of the Fermi-Hubbard model when U t. The first term,Ĥ t ∝ t, introduces couplings between string states Σ, Σ corresponding to holes tunneling to neighboring sites on the Bethe lattice:Ĥ Σ t = −t Σ,Σ | j, σ, Σ j, σ, Σ| + h.c.. The spin-exchange part of the Hamiltonian,Ĥ J ∝ J, only depends on the spin configuration in the lattice. Because the strings distort this configuration, they can be associated with a finite potential energy V pot (Σ) = j, σ, Σ|Ĥ J | j, σ, Σ . In general, this expression depends on the specifics of the string configuration Σ. To simplify our model, we neglect self-interactions of the string and assume a linear string potential depending only on the string length Σ ; thus in our effective model we consider the HamiltonianĤ Σ J = Σ V pot ( Σ ). The potential is derived by considering only straight strings, which yields V pot ( Σ ) = (dE/d ) Σ + g 0 δ Σ ,0 + µ h ; the linear string tension is (dE/d ) = 2J C s ( √ 2) − C s (1) where C s (d) is the spin-spin correlator at distance d as defined in the main text but for the undoped system, and the attractive potential g 0 = −J (C s (2) − C s (1)) favors short strings. µ h = J(1 + C s (2) − 5C s (1)) denotes an overall energy offset which is irrelevant for our purposes. The most extreme self-interactions of the string, caused by loop configurations, are not expected to invalidate the geometric string approach; rather, they modify the hole dispersion and lead to additional dressing of the string with magnetic fluctuations [S2].
Using the effective geometric string HamiltonianĤ Σ =Ĥ Σ t +Ĥ Σ J introduced above, we can calculate the expected string length distribution. We consider a thermal state ρ Σ = e −βĤ Σ /Z Σ for the string part. The overall state ρ = ρ 1/2 ⊗ ρ Σ factorizes and we use the experimental temperature T = 0.6J throughout. This fixes the string tension (dE/d ) = 0.85J, which we obtain by calculating the finite-temperature spin correlations C s (1), C s ( √ 2) in the undoped Heisenberg model using a standard quantum Monte Carlo code from the ALPS package. We keep track of the exponentially large string Hilbert space by making use of the discrete rotational symmetries of the Bethe lattice which are present when the string potential depends only on the length Σ of the string [S2]. The resulting string length distribution is shown in Fig. 1B of the main text; there, however, we show string lengths in units of sites rather than the bond count: the length of a string (in sites) is related to the length Σ (in bonds) as = Σ + 1.
A few comments are in order. First, we fix the quantum numbers σ and j specifying the beginning of the geometric string. However, spin-exchange processes introduce matrix elements between states with different initial positions, | j, σ, Σ and | j , σ, Σ , with a strength ∝ J smaller than the dominant hopping amplitude t > J. As a result of such processes, we expect that j can be chosen randomly. Second, the beginnings of different fluctuating strings are expected to become correlated at sufficiently low temperatures. However, since their dynamics is determined by an energy scale J, and the experimental temperature is of similar order of magnitude, we expect that such correlations between j 1 and j 2 associated with two different holes can be neglected in the current experimental regime. Third, thermal excitations of the fluctuating strings include vibrational and rotational [S2] excitations. If rotational excitations are ignored, a significantly narrower string length distribution is obtained which is dominated by quantum fluctuations. Indeed, at somewhat higher temperatures T ≈ 0.8J, we find from our effective model that the string length diverges because the free energy can be reduced by creating a high-entropy state with exponentially many rotational excitations. This transition is predicted in a regime where the experimental sample is too hot to measure a string signal which differs significantly from an infinite temperature state.
Our approach is based on earlier work by Bulaevskii et al. [S3] and later by Brinkman and Rice [S4] and Trugman [S18], where similar calculations with strings have been performed at zero temperature and considering a classical Néel state. The frozen-spin approximation represents an approximate way of generalizing these results to situations with quantum and thermal fluctuations. The obtained trial wavefunction can also be interpreted as a microscopic formulation of the meson picture of magnetic polarons: instead of the most common description of holes as heavily dressed by magnetic fluctuations [S9-S12, S20], this theory -originally proposed by Béran et al. [S5] using phenomenological arguments -describes the doped holes as bound states of spin-less chargons and charge-neutral spinons. Including the properties of the spinon, located at the opposite end of the geometric string from the chargon, is essential for recovering the known microscopic properties of a single hole in an AFM. On a macroscopic level, the geometric-string theory discussed here describes a fermionic gas of mesons -a candidate state which has also been proposed for the elusive pseudogap phase in cuprates [S21-S23].

π-flux theory
We use Metropolis Monte Carlo sampling to obtain Fock states of fermions described by the Gutzwiller projected thermal density matrixρ = P GW e −Ĥ M F /k B T P GW determined by the quadratic Hamiltonian i,σĉ i+ x,σ + e iθ0ĉ † i,σĉ i+ y,σ + h.c. .

(S3)
Free chargon theory For reference, we consider a purely phenomenological theory of free fermionic chargons in the intermediate doping regime above δ > 5%. Theoretically, it is motivated by the possibility that spinon-chargon pairs unbind and a deconfined phase of chargons may be realized. For simplicity we consider free fermions, although qualitatively similar anti-correlations would be expected for a gas of bosonic chargons with hard core repulsion. More informed theoretical work has also proposed the possibility of a non-trivial metallic state of chargons [S8]. In our present work, we compare to a free chargon theory to calculate the hole (or anti-moment) correlations. The phenomenological model assumes point-like fermionic chargonsĥ j on the square lattice, with an effective HamiltonianĤ ch = −t i, j ĥ † jĥ i + h.c. and with the largest conceivable hopping strength t between neighboring sites. The chargon-chargon correlations are then calculated from a simple thermal state ρ ch = e −βĤ ch with β = 1/k B T for the experimental temperature T = 0.6J and t = 2J, see Fig. 5 in the main text.

Point-like magnetic polaron theory
We compare the experimentally measured anti-moment correlations to a model of free point-like magnetic polarons with the known dispersion of a free hole in an AFM [S11, S12]. To this end we consider a model of free, point-like, fermionic magnetic polaronsm j on the square lattice, with a momentum-space HamiltonianĤ mp = km † kĥ k mp ( k). The dispersion relation was approximated as: The data presented in Fig. 4 of the main text includes samples with temperatures between 0.5J and 0.7J, binned by doping values with 2% resolution. In Fig. S9A we plot C s (1), C s ( √ 2), and C s (2) versus doping for each continuously measured experimental dataset, where colorbars for each quantity denote temperature. While it is clear that colder temperatures are accompanied by stronger correlations, crucially one can see that the zero crossing of C s ( √ 2) persists across the entire temperature range included.
We can also compare the experimentally measured C s (1) versus doping to determinantal quantum Monte Carlo calculation of the Hubbard model on an 8 × 8 homogeneous square lattice using the Quantum Electron Simulation Toolbox, see Fig. S9B [S16]. Agreement between the two indicate that our experimental approach to doping the system does not increase the temperature of the sample beyond experimental uncertainty.

Spin correlator mixing: comparison to other theories
As an extension of Fig. 4A of the main text, in Fig. S10 we plot C s (d) for d = √ 2, 2, √ 8, and 3 for experiment, geometric strings, and π-flux states. While statistical uncertainty makes it challenging to quantitatively compare experiment with theory, on a qualitative level it appears that strings explain the experimental dependence on doping more closely than π-flux states do. In fact, under the geometric-string picture, the overestimation of the experimentally measured C s ( √ 2) by strings can be explained by the string model underestimation of C s (1); for a comparable mixing rate, weakened C s (1) character would lead to a reduced decrease in C s ( √ 2) with doping. For larger-distance correlators, it also appears that not all of them exhibit a sign change with doping, which should not be surprising given that beyond the nearest neighbor correlator, correlation strengths are much more similar in magnitude.        FIG. S10. Larger distance spin-spin correlators Cs(d) from experiment (left), geometric-string theory (middle), and π-flux states (right). Strings seem to explain the experimental data more closely than π-flux states.