# Data

### On this site, you find the following:

- Realized single-stock return skewness data calculated using the methodology of Neuberger and Payne (2018; WP).
- Capacity overhang data calculated using the stochastic frontier model methodology advocated in Aretz and Pope (2018; JF).

Please don't refrain from e-mailing me if you have any questions about the data (kevin.aretzATmanchester.ac.uk).

## Neuberger and Payne (2018) Realized Skewness Data (1954-End 2017)

### Downloadable Data:

The realized skewness estimate data used in my working paper titled "Do Stock Markets Really Care About Skewness?" (joint with Eser Arisoy and currently under submission at the Journal of Financial and Quantitative Analysis) can be downloaded from HERE (csv file; circa 303 MB). The sample period of the data goes from December 1954 to December 2017.

The data contain the CRSP permanent identifier ("permno"), the calendar year and month, and Neuberger and Payne (2018) estimates of the skewness of single-stock dollar returns over various return horizons. The name of each estimate starts with "NP18RS." The last part of the name then indicates the return horizon. For example, "NP18RS_9months" is the realized skewness of single-stock dollar returns over a nine-month return period, estimated using only data up until the associated year and month. Please see our paper for more information about the calculation and the timing of the realized skewness estimates.

### Estimation:

We use Equation (12) in Neuberger and Payne's (2018) working paper "The Skewness of the Stock Market at Long Horizons" (available on SSRN from HERE) to calculate the skewness of single-stock dollar returns. In particular, we start from daily data over a period of five years ending with the year-month combination shown in the data. We use the first year in that five-year period to calculate the covariance between the geometric average of past returns ("y_{t-1}") and the approximated squared return ("x^{(2,E)}(r_t)") - see the last (unnumbered) equation in Section 3 of Neuberger and Payne's (2018) paper. Using the covariance estimate, we then use the remaining four years to calculate the skewness of the dollar return according to Equation (12) in the paper.

IMPORTANT: We have calculated skewness for all stocks available on CRSP with five consecutive years of daily data in the sample period from start-1950 to end-2017, explaining why the data start in December 1954. The data include non-common equity (share codes other than 10 or 11) and shares traded on exchanges other than the NYSE, AMEX, or Nasdaq (exchange codes other than 1, 2, and 3). You may want to exclude non-common equity and shares not traded on the NYSE, AMEX, or Nasdaq. Finally, the data are not yet winsorized or trimmed at any level. To be useful, the data have to winsorized/trimmed, to take care of extreme outliers.

### The Pricing of Historical Realized Skewness:

While not relevant for our working paper, the table below takes a casual look at the pricing of historical realized skewness estimated according to Neuberger and Payne (2018) in the cross-section of US single stocks. To that end, we sort all stocks with a price above USD 2 at the end of month t-1 into percentile portfolios according to various realized skewness estimates calculated using data until the end of month t-1. The realized skewness estimates represent the realized skewness of monthly, quarterly, bi-annual, annual, and two-year returns. We value-weight the portfolios and hold them over month t. We also form a spread portfolio long the highest and short the lowest skewness portfolio. To adjust the spread portfolio for risk, we regress its return on the excess market return ("CAPM"), the Fama-French 3-factor model factors ("FF3"), and the Fama-French 5-factor model factors ("FF5") and report the intercept. T-statistics are calculated using Newey and West's formula (1987) with a lag length of twelve. The sample period is July 1963 to December 2017.

Similar to the vast majority of studies investigating the pricing of historical skewness, the table suggests that the historical skewness estimates calculated using Neuberger and Payne's (2018) methodology are not significantly negatively priced in single stocks.

## Capacity Overhang Estimate Data (End-1969-Mid 2018)

### Downloadable Data:

The capacity overhang estimate data used in my paper titled "Real Options Models of the Firm, Capacity Overhang, and the Cross-Section of Stock Returns" (joint with Peter Pope and published in 2018 in the Journal of Finance (volume 73, 1363-1415)) can be downloaded from HERE (csv file; circa 48 MB). The data have been updated to mid-2018.

The data contain the CRSP permanent identifier ("permno"), the calendar year and month, and the capacity overhang estimate. For each month, the capacity overhang estimate is calculated by combining the parameter estimates from a stochastic frontier model estimation using data until the end of the prior calendar year with the -- at that time -- most recent available values of the existing capacity proxy, the optimal capacity determinant proxies, and the capacity overhang proxies. Please see our paper for more information about the timing of the variables.

### Estimation Command:

We use the user-written Stata command "sfcross" to estimate the free parameters of the stochastic frontier model employed to calculate the capacity overhang estimate. The estimation command looks as follows:

*sfcross LnExistingCapacityProxy Vector(LnOptimalCapacityDeterminantsProxies) Vector(IndustryDummies) if time <= EndEstimationPeriod, noconstant nolog technique(bfgs) svfrontier(StartingValues) distribution(tnormal) cost emean (Vector(CapacityOverhangDeterminants),noconstant).*

While we use the ten Campbell (1996) industries to calculate the industry dummies in the paper, for comparability, we have decided to use the 17 Fama-French industries in the calculation of the updated data. More information about the 17 Fama-French industries is provided on Kenneth French's website, in particular, HERE.

### Some Updated Results:

The table below shows the results from re-estimations of several key Fama-MacBeth (1973) regressions using the capacity overhang data available above. The dependent variable is the single-stock return over month t. The main independent variable is the one-month lagged capacity overhang variable (i.e., the value of capacity overhang estimated using only information publicly available until the end of month t-1). The control variables are the lagged market beta, log market capitalization, the log book-to-market ratio, the log eleven-month compounded past return ("momentum"), total profitability, return-on-equity, and log asset growth. See our paper for more information about the regression variables. The main numbers are premia, in percent and per month. The t-statistics, reported in square parentheses, are derived from Newey-West (1987) standard errors, setting the lag parameter equal to twelve months. The sample period is January 1972 to June 2018, longer than the sample period in the paper, January 1972 to December 2013.

The tables shows that, even over the updated sample period, capacity overhang remains significantly negatively priced, with a premium of about -0.60 to -0.80 percent per month. Also, controlling for capacity overhang helps to explain momentum anomalies (compare columns (3) and (4)) and profitablity anomalies (compare columns (5) and (6)). Conversely, controlling for capacity overhang does not help to explain asset growth anomalies (compare columns (7) and (8)).