# Data

Please don't refrain from e-mailing me if you have any questions about the data (kevin.aretzATmanchester.ac.uk).

## Capacity Overhang Estimate Data (End-1969-Mid 2018)

### Downloadable Data:

The capacity overhang estimate data used in my paper titled "Real Options Models of the Firm, Capacity Overhang, and the Cross-Section of Stock Returns" (joint with Peter Pope and published in 2018 in the Journal of Finance (volume 73, 1363-1415)) can be downloaded from **HERE** (csv file; circa 48 MB). The data have been updated to mid-2018.

The data contain the CRSP permanent identifier ("permno"), the calendar year and month, and the capacity overhang estimate. For each month, the capacity overhang estimate is calculated by combining the parameter estimates from a stochastic frontier model estimation using data until the end of the prior calendar year with the -- at that time -- most recent available values of the existing capacity proxy, the optimal capacity determinant proxies, and the capacity overhang proxies. Please see our paper for more information about the timing of the variables.

### Estimation Command:

We use the user-written Stata command "sfcross" to estimate the free parameters of the stochastic frontier model employed to calculate the capacity overhang estimate. The estimation command looks as follows:

*sfcross LnExistingCapacityProxy Vector(LnOptimalCapacityDeterminantsProxies) Vector(IndustryDummies) if time <= EndEstimationPeriod, noconstant nolog technique(bfgs) svfrontier(StartingValues) distribution(tnormal) cost emean (Vector(CapacityOverhangDeterminants),noconstant).*

While we use the ten Campbell (1996) industries to calculate the industry dummies in the paper, for comparability, we have decided to use the 17 Fama-French industries in the calculation of the updated data. More information about the 17 Fama-French industries is provided on Kenneth French's website, in particular, **HERE**.

### Some Updated Results:

The table below shows the results from re-estimations of several key Fama-MacBeth (1973) regressions using the capacity overhang data available above. The dependent variable is the single-stock return over month t. The main independent variable is the one-month lagged capacity overhang variable (i.e., the value of capacity overhang estimated using only information publicly available until the end of month t-1). The control variables are the lagged market beta, log market capitalization, the log book-to-market ratio, the log eleven-month compounded past return ("momentum"), total profitability, return-on-equity, and log asset growth. See our paper for more information about the regression variables. The main numbers are premia, in percent and per month. The t-statistics, reported in square parentheses, are derived from Newey-West (1987) standard errors, setting the lag parameter equal to twelve months. The sample period is January 1972 to June 2018, longer than the sample period in the paper, January 1972 to December 2013.

The tables shows that, even over the updated sample period, capacity overhang remains significantly negatively priced, with a premium of about -0.60 to -0.80 percent per month. Also, controlling for capacity overhang helps to explain momentum anomalies (compare columns (3) and (4)) and profitablity anomalies (compare columns (5) and (6)). Conversely, controlling for capacity overhang does not help to explain asset growth anomalies (compare columns (7) and (8)).