Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball Andrew Miller School of Engineering and Applied Sciences, Harvard University, Cambridge, USA Luke Bornn Department of Statistics, Harvard University, Cambridge, USA Ryan Adams School of Engineering and Applied Sciences, Harvard University, Cambridge, USA Kirk Goldsberry Center for Geographic Analysis, Harvard University, Cambridge, USA ACM @ SEAS . HARVARD . EDU BORNN @ STAT. HARVARD . EDU RPA @ SEAS . HARVARD . EDU KGOLDSBERRY @ FAS . HARVARD . EDU Abstract We develop a machine learning approach to represent and analyze the underlying spatial structure that governs shot selection among professional basketball players in the NBA. Typically, NBA players are discussed and compared in an heuristic, imprecise manner that relies on unmeasured intuitions about player behavior. This makes it difficult to draw comparisons between players and make accurate player specific predictions. Modeling shot attempt data as a point process, we create a low dimensional representation of offensive player types in the NBA. Using non-negative matrix factorization (NMF), an unsupervised dimensionality reduction technique, we show that a low-rank spatial decomposition summarizes the shooting habits of NBA players. The spatial representations discovered by the algorithm correspond to intuitive descriptions of NBA player types, and can be used to model other spatial effects, such as shooting accuracy. effective way to characterize a Poisson process, and λ(·) itself is typically of interest. Nonparametric methods to fit intensity functions are often desirable due to their flexibility and expressiveness, and have been explored at length (Cox, 1955; Møller et al., 1998; Diggle, 2013). Nonparametric intensity surfaces have been used in many applied settings, including density estimation (Adams et al., 2009), disease mapping (Benes et al., 2002), and models of neural spiking (Cunningham et al., 2008). When data are related realizations of a Poisson process on the same space, we often seek the underlying structure that ties them together. In this paper, we present an unsupervised approach to extract features from instances of point processes for which the intensity surfaces vary from realization to realization, but are constructed from a common library. The main contribution of this paper is an unsupervised method that finds a low dimensional representation of related point processes. Focusing on the application of modeling basketball shot selection, we show that a matrix decomposition of Poisson process intensity surfaces can provide an interpretable feature space that parsimoniously describes the data. We examine the individual components of the matrix decomposition, as they provide an interesting quantitative summary of players’ offensive tendencies. These summaries better characterize player types than any traditional categorization (e.g. player position). One application of our method is personnel decisions. Our representation can be used to select sets of players with diverse offensive tendencies. This representation is then leveraged in a latent variable model to visualize a player’s field goal percentage as a function of location on the court. 1. Introduction The spatial locations of made and missed shot attempts in basketball are naturally modeled as a point process. The Poisson process and its inhomogeneous variant are popular choices to model point data in spatial and temporal settings. Inferring the latent intensity function, λ(·), is an Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. Copyright 2014 by the author(s). Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball 1.1. Related Work Previously, Adams et al. (2010) developed a probabilistic matrix factorization method to predict score outcomes in NBA games. Their method incorporates external covariate information, though they do not model spatial effects or individual players. Goldsberry & Weiss (2012) developed a framework for analyzing the defensive effect of NBA centers on shot frequency and shot efficiency. Their analysis is restricted, however, to a subset of players in a small part of the court near the basket. Libraries of spatial or temporal functions with a nonparametric prior have also been used to model neural data. Yu et al. (2009) develop the Gaussian process factor analysis model to discover latent ‘neural trajectories’ in high dimensional neural time-series. Though similar in spirit, our model includes a positivity constraint on the latent functions that fundamentally changes their behavior and interpretation. We use this smoothness property to encode our inductive bias that shooting habits vary smoothly over the court space. For a more thorough treatment of Gaussian processes, see Rasmussen & Williams (2006). 2.2. Poisson Processes A Poisson process is a completely spatially random point process on some space, X , for which the number of points that end up in some set A ⊆ X is Poisson distributed. We will use an inhomogeneous Poisson process on a domain X . That is, we will model the set of spatial points, x1 , . . . , xN with xn ∈ X , as a Poisson process with a non-negative intensity function λ(x) : X → R+ (throughout this paper, R+ will indicate the union of the positive reals and zero). This implies that for any set A ⊆ X , the number of points that fall in A, NA , will be Poisson distributed, NA ∼ Poiss A λ(dA) . (2) 2. Background This section reviews the techniques used in our point process modeling method, including Gaussian processes (GPs), Poisson processes (PPs), log-Gaussian Cox processes (LGCPs) and non-negative matrix factorization (NMF). 2.1. Gaussian Processes A Gaussian process is a stochastic process whose sample path, f1 , f2 · · · ∈ R, is normally distributed. GPs are frequently used as a probabilistic model over functions f : X → R, where the realized value fn ≡ f (xn ) corresponds to a function evaluation at some point xn ∈ X . The spatial covariance between two points in X encode prior beliefs about f ; covariances can encode beliefs about a wide range of properties, including differentiability, smoothness, and periodicity. As a concrete example, imagine a smooth function f : R2 → R for which we have observed a set of locations x1 , . . . , xN and values f1 , . . . , fN . We can model this ‘smooth’ property by choosing a covariance function that results in smooth processes. For instance, the squared exponential covariance function cov(fi , fj ) = k(xi , xj ) = σ 2 exp − 1 ||xi − xj ||2 2 φ2 (1) Furthermore, a Poisson process is ‘memoryless’, meaning that NA is independent of NB for disjoint subsets A and B. We signify that a set of points x ≡ {x1 , . . . , xN } follows a Poisson process as x ∼ PP(λ(·)). (3) One useful property of the Poisson process is the superposition theorem (Kingman, 1992), which states that given a countable collection of independent Poisson processes x1 , x2 , . . . , each with measure λ1 , λ2 , . . . , their superposition is distributed as ∞ ∞ xk k=1 ∼ PP k=1 λk . (4) Furthermore, note that each intensity function λk can be scaled by some non-negative factor and remain a valid intensity function. The positive scalability of intensity functions and the superposition property of Poisson processes motivate the non-negative decomposition (Section 2.4) of a global Poisson process into simpler weighted subprocesses that can be shared between players. 2.3. Log-Gaussian Cox Processes A log-Gaussian Cox process (LGCP) is a doubly-stochastic Poisson process with a spatially varying intensity function modeled as an exponentiated GP Z(·) ∼ λ(·) ∼ x1 , . . . , x N ∼ GP(0, k(·, ·)) exp(Z(·)) PP(λ(·)) (5) (6) (7) assumes the function f is infinitely differentiable, with marginal variation σ 2 and length-scale φ, which controls the expected number of direction changes the function exhibits. Because this covariance is strictly a function of the distance between two points in the space X , the squared exponential covariance function is said to be stationary. where doubly-stochastic refers to two levels of randomness: the random function Z(·) and the random point process with intensity λ(·). Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball 2.4. Non-Negative Matrix Factorization Non-negative matrix factorization (NMF) is a dimensionality reduction technique that assumes some matrix Λ can be approximated by the product of two low-rank matrices Λ = WB (8) where the matrix Λ ∈ RN ×V is composed of N data points + of length V , the basis matrix B ∈ RK×V is composed of + K basis vectors, and the weight matrix W ∈ RN ×K is + composed of the N non-negative weight vectors that scale and linearly combine the basis vectors to reconstruct Λ. Each vector can be reconstructed from the weights and the bases K Due to the positivity constraint, the basis B∗ tends to be disjoint, exhibiting a more ‘parts-based’ decomposition than other, less constrained matrix factorization methods, such as PCA. This is due to the restrictive property of the NMF decomposition that disallows negative bases to cancel out positive bases. In practice, this restriction eliminates a large swath of ‘optimal’ factorizations with negative basis/weight pairs, leaving a sparser and often more interpretable basis (Lee & Seung, 1999). 3. Data Our data consist of made and missed field goal attempt locations from roughly half of the games in the 2012-2013 NBA regular season. These data were collected by optical sensors as part of a program to introduce spatio-temporal information to basketball analytics. We remove shooters with fewer than 50 field goal attempts, leaving a total of about 78,000 shots distributed among 335 unique NBA players. We model a player’s shooting as a point process on the offensive half court, a 35 ft by 50 ft rectangle. We will index players with n ∈ {1, . . . , N }, and we will refer to the set of each player’s shot attempts as xn = {xn,1 , . . . , xn,Mn }, where Mn is the number of shots taken by player n, and xn,m ∈ [0, 35] × [0, 50]. When discussing shot outcomes, we will use yn,m ∈ {0, 1} to indicate that the nth player’s mth shot was made (1) or missed (0). Some raw data is graphically presented in Figure 1(a). Our goal is to find a parsimonious, yet expressive representation of an NBA basketball player’s shooting habits. 3.1. A Note on Non-Stationarity As an exploratory data analysis step, we visualize the empirical spatial correlation of shot counts in a discretized space. We discretize the court into V tiles, and compute X such that Xn,v = |{xn,i : xn,i ∈ v}|, the number of shots by player n in tile v. The empirical correlation, depicted with respect to a few tiles in Figure 2, provides some intuition about the non-stationarity of the underlying intensity surfaces. Long range correlations exist in clearly non-stationary patterns, and this inductive bias is not captured by a stationary LGCP that merely assumes a locally smooth surface. This motivates the use of an additional method, such as NMF, to introduce global spatial patterns that attempt to learn this long range correlation. λn = k=1 Wn,k Bk,: . (9) The optimal matrices W∗ and B∗ are determined by an optimization procedure that minimizes (·, ·), a measure of reconstruction error or divergence between WB and Λ with the constraint that all elements remain non-negative: W∗ , B∗ = arg min (Λ, WB). W,B≥0 (10) Different metrics will result in different procedures. For arbitrary matrices X and Y, one option is the squared Frobenius norm, 2 (X, Y) = i,j (Xij − Yij )2 . (11) Another choice is a matrix divergence metric KL (X, Y) = i,j Xij log Xij − Xij + Yij Yij (12) which reduces to the Kullback-Leibler (KL) divergence when interpreting matrices X and Y as discrete distributions, i.e., ij Xij = ij Yij = 1 (Lee & Seung, 2001). Note that minimizing the divergence KL (X, Y) as a function of Y will yield a different result from optimizing over X. The two loss functions lead to different properties of W∗ and B∗ . To understand their inherent differences, note that the KL loss function includes a log ratio term. This tends to disallow large ratios between the original and reconstructed matrices, even in regions of low intensity. In fact, regions of low intensity can contribute more to the loss function than regions of high intensity if the ratio between them is large enough. The log ratio term is absent from the Frobenius loss function, which only disallows large differences. This tends to favor the reconstruction of regions of larger intensity, leading to more basis vectors focused on those regions. 4. Proposed Approach Our method ties together the two ideas, LGCPs and NMF, to extract spatial patterns from NBA shooting data. Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball Stephen Curry (940 shots) q qq qqqqq q q qq q q q q qq q q q q q qqq q q qq q q q qq q q q q q qq q q q q q q qq q q qq q qq q q qqqq q q q qq qq q qq qq q q qq q q q qq q qq q qq q q q q q q qq q qq q q q q q q q q q q q qq q q q qq q q qq q qq qq q q q q q q q q q q q q q qq qq q qq q q qq q qq qq q qqq q q q q q q qq qq q q q qq q q q qqq qq q qq qq qq q qq q qq q q q qqqq q q q q q q q q q q q qq q q q q qqq q q q qqq qq q qq q q q q q q qq q q qq q q qqqq q q q qq q q q q q qq q q q qq qqq q qqq q q q qq q q q qq q q q q qq qq q q q qq q q q q q q q q q qq q qq q q q q qq qq q q q q q q q q q qq q q q q qqq q q q qq q q q q qqqq q q q qqq q qqqq qq q qq q q q q qq q q qq q q q q q qq LeBron James (315 shots) q q q q q q q q q q q q q q Emp. cor. at (21, 44) Emp. cor. at ( 7, 28) 1.0 0.8 0.6 1.0 0.8 q q q qq q q q q q q q q q qq q q q q q q q q q qq q q q q q q qq q q q q q q qq q q q qq qqqq q q qq q qqqq q q q q q q q qqq qq q qqq q q qq qq q q q qqq q qqq qq q q q qq q q q q qq q q q q qq q q q q q q q qq q q q q q 0.4 0.2 0.0 q q 0.6 0.4 0.2 0.0 (a) points Stephen Curry shot grid 10 10 8 8 6 6 4 2 0 LeBron James shot grid Figure 2. Empirical spatial correlation in raw count data at two marked court locations. These data exhibit non-stationary correlation patterns, particularly among three point shooters. This suggests a modeling mechanism to handle the global correlation. q 4 2 0 q ¯ λn has been normalized s.t. ¯ λn = 1 4. Find B, W for some K such that WB ≈ Λ, constraining all matrices to be non-negative (NMF). This results in a spatial basis B and basis loadings for each individual player, wn . Due to the superposition property of Poisson processes and the non-negativity of the basis and loadings, the basis vectors can be interpreted as subintensity functions, or archetypal intensities used to construct each individual player. The linear weights for each player concisely summarize the spatial shooting habits of a player into a vector in RK . + Though we have described a continuous model for conceptual simplicity, we discretize the court into V one-squarefoot tiles to gain computational tractability in fitting the LGCP surfaces. We expect this tile size to capture all interesting spatial variation. Furthermore, the discretization maps each player into RV , providing the necessary input + for NMF dimensionality reduction. 4.1. Fitting the LGCPs (b) grid Stephen Curry LGCP LeBron James LGCP 5 4 3 4 q 2 1 q 3 2 1 (c) LGCP Stephen Curry LGCP−NMF LeBron James LGCP−NMF 4 3 4 3 q 2 1 q 2 1 (d) LGCP-NMF Figure 1. NBA player representations: (a) original point process data from two players, (b) discretized counts, (c) LGCP surfaces, and (d) NMF reconstructed surfaces (K = 10). Made and missed shots are represented as blue circles and red ×’s, respectively. Some players have more data than others because only half of the stadiums had the tracking system in 2012-2013. For each player’s set of points, xn , the likelihood of the point process is discretely approximated as V p(xn |λn (·)) ≈ v=1 p(Xn,v |∆Aλn,v ) (13) Given point process realizations for each of N players, x1 , . . . , xN , our procedure is 1. Construct the count matrix Xn,v = # shots by player n in tile v on a discretized court. 2. Fit an intensity surface λn = (λn,1 , . . . , λn,V )T for each player n over the discretized court (LGCP). ¯ ¯ 3. Construct the data matrix Λ = (λ1 , . . . , λN )T , where where, overloading notation, λn (·) is the exact intensity function, λn is the discretized intensity function (vector), and ∆A is the area of each tile (implicitly one from now on). This approximation comes from the completely spatially random property of the Poisson process, allowing us to treat each tile independently. The probability of the count present in each tile is Poisson, with uniform intensity λn,v . Explicitly representing the Gaussian random field zn , the Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball 4.3. Alternative Approaches With the goal of discovering the shared structure among the collection of point processes, we can proceed in a few alternative directions. For instance, one could hand-select a spatial basis and directly fit weights for each individual point process, modeling the intensity as a weighted combination of these bases. However, this leads to multiple restrictions: firstly, choosing the spatial bases to cover the court is a highly subjective task (though, there are situations where it would be desirable to have such control); secondly, these bases are unlikely to match the natural symmetries of the basketball court. In contrast, modeling the intensities with LGCP-NMF uncovers the natural symmetries of the game without user guidance. Another approach would be to directly factorize the raw shot count matrix X. However, this method ignores spatial information, and essentially models the intensity surface as a set of V independent parameters. Empirically, this method yields a poorer, more degenerate basis, which can be seen in Figure 4(c). Furthermore, this is far less numerically stable, and jitter must be added to entries of Λ for convergence. Finally, another reasonable approach would apply PCA directly to the discretized LGCP intensity matrix Λ, though as Figure 4(d) demonstrates, the resulting mixed-sign decomposition leads to an unintuitive and visually uninterpretable basis. ● ● (a) Corner threes (b) Wing threes ● ● (c) Top of key threes (d) Long two-pointers Figure 3. A sample of basis vectors (surfaces) discovered by LGCP-NMF for K = 10. Each basis surface is the normalized intensity function of a particular shot type, and players’ shooting habits are a weighted combination of these shot types. Conditioned on certain shot type (e.g. corner three), the intensity function acts as a density over shot locations, where red indicates likely locations. posterior is 5. Results (14) N (zn |0, K) (15) (16) We graphically depict our point process data, LGCP representation, and LGCP-NMF reconstruction in Figure 1 for K = 10. There is wide variation in shot selection among NBA players - some shooters specialize in certain types of shots, whereas others will shoot from many locations on the court. Our method discovers basis vectors that correspond to visually interpretable shot types. Similar to the parts-based decomposition of human faces that NMF discovers in Lee & Seung (1999), LGCP-NMF discovers a shots-based decomposition of NBA players. Setting K = 10 and using the KL-based loss function, we display the resulting basis vectors in Figure 3. One basis corresponds to corner three-point shots 3(a), while another corresponds to wing three-point shots 3(b), and yet another to top of the key three point shots 3(c). A comparison between KL and Frobenius loss functions can be found in Figure 4. Furthermore, the player specific basis weights provide a concise characterization of their offensive habits. The weight wn,k can be interpreted as the amount player n takes shot type k, which quantifies intuitions about player behav- p(zn |xn ) ∝ p(xn |zn )p(zn ) V = v=1 e−λn,v Xn,v λn,v Xn,v ! λn = exp(zn + z0 ) where the prior over zn is a mean zero normal with covariance Kv,u = k(xv , xu ), determined by Equation 1, and z0 is a bias term that parameterizes the mean rate of the Poisson process. Samples of the posterior p(λn |xn ) can be constructed by transforming samples of zn |xn . To overcome the high correlation induced by the court’s spatial structure, we employ elliptical slice sampling (Murray et al., 2010) to approximate the posterior of λn for each player, and subsequently store the posterior mean. 4.2. NMF Optimization We now solve the optimization problem using techniques from Lee & Seung (2001) and Brunet et al. (2004), comparing the KL and Frobenius loss functions to highlight the difference between the resulting basis vectors. Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball q q q q q q q q q q q q q q q q q q q LeBron James Brook Lopez Tyson Chandler Marc Gasol Tony Parker Kyrie Irving Stephen Curry James Harden Steve Novak 0.21 0.06 0.26 0.19 0.12 0.13 0.08 0.34 0.00 0.16 0.27 0.65 0.02 0.22 0.10 0.03 0.00 0.01 0.12 0.43 0.03 0.17 0.17 0.09 0.07 0.11 0.00 0.09 0.09 0.00 0.01 0.07 0.13 0.01 0.00 0.02 0.04 0.01 0.01 0.33 0.21 0.16 0.10 0.03 0.00 0.07 0.03 0.02 0.25 0.07 0.02 0.08 0.02 0.00 0.00 0.08 0.01 0.00 0.08 0.13 0.22 0.13 0.01 0.07 0.03 0.01 0.01 0.06 0.00 0.05 0.00 0.27 0.08 0.00 0.02 0.00 0.00 0.10 0.10 0.11 0.35 0.17 0.01 0.01 0.03 0.00 0.14 0.24 0.26 0.34 Table 1. Normalized player weights for each basis. The first three columns correspond to close-range shots, the next four correspond to mid-range shots, while the last three correspond to three-point shots. Larger values are highlighted, revealing the general ‘type’ of shooter each player is. The weights themselves match intuition about players shooting habits (e.g. three-point specialist or mid-range shooter), while more exactly quantifying them. q q q q q q q q q q q q q q q q q q (a) LGCP-NMF (KL) q q q q q q q q q q q q q q q q q q (b) LGCP-NMF (Frobenius) q q q q q q q q q q q q q q q q q q (c) Direct NMF (KL) q q q q q q q q q q q q q q q q q q (d) LGCP-PCA Figure 4. Visual comparison of the basis resulting from various approaches to dimensionality reduction. The top two bases result from LGCP-NMF with the KL (top) and Frobenius (second) loss functions. The third row is the NMF basis applied to raw counts (no spatial continuity). The bottom row is the result of PCA applied to the LGCP intensity functions. LGCP-PCA fundamentally differs due to the negativity of the basis surfaces. Best viewed in color. ior. Table 1 compares normalized weights between a selection of players. Empirically, the KL-based NMF decomposition results in a more spatially diverse basis, where the Frobenius-based decomposition focuses on the region of high intensity near the basket at the expense of the rest of the court. This can be seen by comparing Figure 4(a) (KL) to Figure 4(b) (Frobenius). We also compare the two LGCP-NMF decompositions to the NMF decomposition done directly on the matrix of counts, X. The results in Figure 4(c) show a set of sparse basis vectors that are spatially unstructured. And lastly, we Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball Predictive Likelihood (10−fold cv) −6.10 6.1. Latent variable model For player n, we model each shot event as kn,i ∼ Mult(wn,: ) ¯ ¯ xn,i |kn,i ∼ Mult(Bk ) n,i shot type location (βn,kn,i )) outcome log likelihood −6.20 yn,i |kn,i ∼ Bern(logit −1 K=1 K=3 K=5 K=10 Model K=15 K=20 LGCP Figure 5. Average player test data log likelihoods for LGCPNMF varying K and independent LGCP. For each fold, we held out 10% of each player’s shots, fit independent LGCPs and ran NMF (using the KL-based loss function) for varying K. The predictive performance of our representation improves upon the high dimensional independent LGCPs, showing the importance of pooling information across players. ¯ where Bk ≡ Bk / k Bk is the normalized basis, and the player weights wn,k are adjusted to reflect the total mass ¯ of each unnormalized basis. NMF does not constrain each basis vector to a certain value, so the volume of each basis vector is a meaningful quantity that corresponds to how common a shot type is. We transfer this information into the weights by setting wn,k = wn,k ¯ v −6.30 Bk (v). adjusted basis loadings We do not directly observe the shot type, k, only the shot location xn,i . Omitting n and i to simplify notation, we can compute the the predictive distribution K p(y|x) = depict the PCA decomposition of the LGCP matrix Λ in Figure 4(d). This yields the most colorful decomposition because the basis vectors and player weights are unconstrained real numbers. This renders the basis vectors uninterpretable as intensity functions. Upon visual inspection, the corner three-point ‘feature’ that is salient in the LGCPNMF decompositions appears in five separate PCA vectors, some positive, some negative. This is the cancelation phenomenon that NMF avoids. We compare the fit of the low rank NMF reconstructions and the original LGCPs on held out test data in Figure 5. The NMF decomposition achieves superior predictive performance over the original independent LGCPs in addition to its compressed representation and interpretable basis. k=1 K p(y|k)p(k|x) p(x|k)p(k) k p(x|k )p(k ) = z=1 p(y|k) where the outcome distribution is red and the location distribution is blue for clarity. The shot type decomposition given by B provides a natural way to share information between shooters to reduce the variance in our estimated surfaces. We hierarchically model player probability parameters βn,k with respect to each shot type. The prior over parameters is 2 β0,k ∼ N (0, σ0 ) 2 σk diffuse global prior basis variance player/basis params ∼ Inv-Gamma(a, b) 2 N (β0,k , σk ) βn,k ∼ 6. From Shooting Frequency to Efficiency Unadjusted field goal percentage, or the probability a player makes an attempted shot, is a statistic of interest when evaluating player value. This statistic, however, is spatially uninformed, and washes away important variation due to shooting circumstances. Leveraging the vocabulary of shot types provided by the basis vectors, we model a player’s field goal percentage for each of the shot types. We decompose a player’s field goal percentage into a weighted combination of K basis field goal percentages, which provides a higher resolution summary of an offensive player’s skills. Our aim is to estimate the probability of a made shot for each point in the offensive half court for each individual player. 2 where the global means, β0,k , and variances, σk , are given 2 diffuse priors, σ0 = 100, and a = b = .1. The goal of this hierarchical prior structure is to share information between players about a particular shot type. Furthermore, it will shrink players with low sample sizes to the global mean. Some consequences of these modeling decisions will be discussed in Section 7. 6.2. Inference Gibbs sampling is performed to draw posterior samples of the β and σ 2 parameters. To draw posterior samples of β|σ 2 , y, we use elliptical slice sampling to exploit the normal prior placed on β. We can draw samples of σ 2 |β, y directly due to conjugacy. Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball Posterior Mean Court 1.0 0.035 0.8 0.030 0.025 0.6 0.020 0.015 0.010 0.2 0.005 0.0 0.000 Posterior Mean uncertainty q 0.4 q vides an accurate low dimensional summary of shooting habits and an intuitive basis that corresponds to shot types recognizable by basketball fans and coaches. After visualizing this basis and discussing some of its properties as a quantification of player habits, we then used the decomposition to form interpretable estimates of a spatially shooting efficiency. We see a few directions for future work. Due to the relationship between KL-based NMF and some fully generative latent variable models, including the probabilistic latent semantic model (Ding et al., 2008) and latent Dirichlet allocation (Blei et al., 2003), we are interested in jointly modeling the point process and intensity surface decomposition in a fully generative model. This spatially informed LDA would model the non-stationary spatial structure the data exhibit within each non-negative basis surface, opening the door for a richer parameterization of offensive shooting habits that could include defensive effects. Furthermore, jointly modeling spatial field goal percentage and intensity can capture the correlation between player skill and shooting habits. Common intuition that players will take more shots from locations where they have more accuracy is missed in our treatment, yet modeling this effect may yield a more accurate characterization of a player’s habits and ability. We also see potential spatio-temporal extensions of our model. For instance, the per-game occurrence of shots over the course of a season or the daily occurrence of crimes within a city can be viewed as a spatio-temporal point process. We can extend the LGCP-NMF framework by introducing temporal correlation in the weights of the NMF decomposition. This may decouple spatial patterns from temporal patterns in the data, revealing interesting structure and offering a reduced representation. (a) global mean LeBron James Posterior Court 1.0 (b) posterior variance Steve Novak Posterior Court 1.0 0.8 0.8 0.6 0.6 q 0.4 q 0.4 0.2 0.2 0.0 0.0 (c) Kyrie Irving Posterior Court 1.0 (d) Stephen Curry Posterior Court 1.0 0.8 0.8 0.6 0.6 q 0.4 q 0.4 0.2 0.2 0.0 0.0 (e) (f) Figure 6. (a) Global efficiency surface and (b) posterior uncertainty. (c-f) Spatial efficiency for a selection of players. Red indicates the highest field goal percentage and dark blue represents the lowest. Novak and Curry are known for their 3-point shooting, whereas James and Irving are known for efficiency near the basket. 6.3. Results We visualize the global mean field goal percentage surface, corresponding parameters to β0,k in Figure 6(a). Beside it, we show one standard deviation of posterior uncertainty in the mean surface. Below the global mean, we show a few examples of individual player field goal percentage surfaces. These visualizations allow us to compare players’ efficiency with respect to regions of the court. For instance, our fit suggests that both Kyrie Irving and Steve Novak are below average from basis 4, the baseline jump shot, whereas Stephen Curry is an above average corner three point shooter, valuable information for a defending player. More details are available in the supplemental material. Acknowledgments The authors would like to acknowledge the Harvard XY Hoops group - Alex Franks, Alex D’Amour, Ryan Grossman, and Dan Cervone - as well as the HIPS lab and several referees for helpful suggestions and discussion. We thank STATS LLC for providing the data. To compare various NMF optimization procedures, the authors used the r package NMF (Gaujoux & Seoighe, 2010). References Adams, Ryan P., Murray, Iain, and MacKay, David J.C. Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities. In Proceedings of the 26th International Conference on Machine Learning (ICML), Montreal, Canada, 2009. 7. Discussion We have presented a method that models related point processes using a constrained matrix decomposition of independently fit intensity surfaces. Our representation pro- Factorized Point Process Intensities: A Spatial Analysis of Professional Basketball Adams, Ryan P., Dahl, George E., and Murray, Iain. Incorporating side information into probabilistic matrix factorization using Gaussian processes. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), 2010. Benes, Viktor, Bodl´ k, Karel, and Wagepetersens, Jesper a Møller Rasmus Plenge. Bayesian analysis of log Gaussian Cox processes for disease mapping. Technical report, Aalborg University, 2002. Blei, David M., Ng, Andrew Y., and Jordan, Michael I. Latent Dirichlet allocation. The Journal of Machine Learning Research, 3:993–1022, 2003. Brunet, Jean-Philippe, Tamayo, Pablo, Golub, Todd R, and Mesirov, Jill P. Metagenes and molecular pattern discovery using matrix factorization. Proceedings of the National Academy of Sciences of the United States of America, 101.12:4164–9, 2004. Cox, D. R. Some statistical methods connected with series of events. Journal of the Royal Statistical Society. Series B, 17(2):129–164, 1955. Cunningham, John P, Yu, Byron M, Shenoy, Krishna V, and Sahani, Maneesh. Inferring neural firing rates from spike trains using Gaussian processes. Advances in Neural Information Processing Systems (NIPS), 2008. Diggle, Peter. Statistical Analysis of Spatial and SpatioTemporal Point Patterns. CRC Press, 2013. Ding, Chris, Li, Tao, and Peng, Wei. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics & Data Analysis, 52:3913–3927, 2008. Gaujoux, Renaud and Seoighe, Cathal. A flexible r package for nonnegative matrix factorization. BMC Bioinformatics, 11(1):367, 2010. ISSN 1471-2105. doi: 10.1186/1471-2105-11-367. Goldsberry, Kirk and Weiss, Eric. The Dwight effect: A new ensemble of interior defense analytics for the nba. In Sloan Sports Analytics Conference, 2012. Kingman, John Frank Charles. Poisson Processes. Oxford university press, 1992. Lee, Daniel D. and Seung, H. Sebastian. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755):788–791, 1999. Lee, Daniel D. and Seung, H. Sebastian. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems (NIPS), 13:556–562, 2001. Møller, Jesper, Syversveen, Anne Randi, and Waagepetersen, Rasmus Plenge. Log Gaussian Cox processes. Scandinavian Journal of Statistics, 25 (3):451–482, 1998. Murray, Iain, Adams, Ryan P., and MacKay, David J.C. Elliptical slice sampling. Journal of Machine Learning Research: Workshop and Conference Proceedings (AISTATS), 9:541–548, 2010. Rasmussen, Carl Edward and Williams, Christopher K.I. Gaussian Processes for Machine Learning. The MIT Press, Cambridge, Massachusetts, 2006. Yu, Byron M., John P. Cunningham, Gopal Santhanam, Ryu, Stephen I., and Shenoy, Krishna V. Gaussianprocess factor analysis for low-dimensional single-trial analysis of neural population activity. Advances in Neural Information Processing Systems (NIPS), 2009.