RECENT POSTS
Sunday, November 11, 2018
Cost of Revenue, SG&A: Q3 Update

Monday, November 5, 2018
Lease Accounting: FedEx vs. UPS

Saturday, November 3, 2018
New Email Alerting Powers

Wednesday, October 31, 2018
PTC and Two Tales of Revenue

Tuesday, October 30, 2018
10-K/Q Section Text Change Detection

Sunday, October 28, 2018
Finding Purchase Price Allocation

Sunday, October 21, 2018
Charting Netflix Growth in Three Ways

Wednesday, October 17, 2018
Interesting Data on Interest Income

Thursday, October 11, 2018
The Decline of Sears in Three Charts

Wednesday, October 3, 2018
Buying Revenue At Any Cost...The (Not So) New Strategy For IPOs?

Sunday, September 30, 2018
Campbell Soup Gets Squeezed

Wednesday, September 26, 2018
Exchange Rate Effects on Cash

Monday, September 24, 2018
The Missing Unremitted Foreign Earnings

Tuesday, September 18, 2018
Thoughts on Structured Data

Sunday, September 16, 2018
Calcbench Goes to the Movies

Wednesday, September 12, 2018
The First-Ever Calcbench Webinar

Friday, August 31, 2018
Q&A: Ahmet Kurt and Researching Financial Data Quality

Friday, August 17, 2018
Leasing Costs and the Income Statement

Wednesday, August 15, 2018
Costs of Revenue, SG&A: Q2 Update

Tuesday, August 14, 2018
Who is affected by Turkey?

Archive  |  Search:
10-K/Q Section Text Change Detection
Tuesday, October 30, 2018

The Code

Goal

Reduce the amount of time analysts spend reading 10-K/Qs by highlighting the sections which change the most between periods.

Hypothesis

The cosine distance between Term Frequency - Inverse Document Frequencey (TF-IDF) vectors of 10-K sections is a useful proxy for semantic change in 10-K sections across time.

Procedure

  1. Use the Calcbench Python API Client to download the Risk Factors section of the 10-K from Calcbench
  2. Tokenize the sections
  3. Build TF-IDF matrices
  4. Compute the cosine distance between each section and the same section from the previous filing/period
  5. Render the matrix of distances with largest distances highlighted.
  6. Review large changes by “diffing” documents with distance above a certain threshold.

Highlight Risk Factors with Greatest Change

Brightest cells are those documents which changed the most vis-a-vis the previous period.
2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
JNJ 0 0.01 0.607 0.02 0.57 0 0.042 0 0.026 0
JPM 0 0.55 0.008 0.008 0.013 0.023 0.013 0.028 0.427 0
WBA 0.007 0.014 0.017 0.187 0 0.063 0.244 0.099 0 0
DWDP 0 0.24 0.135 0.045 0.033 0.02 0.033 0.01 0.068 0
V 0 0.034 0.153 0.066 0.007 0.027 0.021 0.016 0.097 0
MCD 0 0.014 0.019 0.015 0.178 0.045 0.038 0.04 0.051 0
VZ 0 0.012 0.061 0.053 0.049 0.097 0.035 0.029 0.063 0
WMT 0.061 0.056 0.025 0.084 0.03 0.076 0.02 0.024 0 0
PFE 0 0.015 0.074 0.063 0.034 0.02 0.039 0.047 0.044 0
PG 0.026 0.02 0.02 0.069 0.022 0.018 0.102 0.026 0 0
UTX 0 0.022 0.006 0.031 0.009 0.04 0.033 0.072 0.069 0
HD 0 0.037 0.018 0.03 0.12 0.017 0.012 0.02 0.025 0
CVX 0 0.013 0.028 0.045 0.02 0.001 0.005 0.075 0.057 0
INTC 0 0 0.011 0.002 0.068 0.018 0.032 0.094 0.017 0
AXP 0 0.009 0.015 0.032 0.016 0.031 0.025 0.022 0.059 0
KO 0 0.013 0.007 0.014 0.012 0.011 0.037 0.035 0.065 0
UNH 0 0.008 0.006 0.006 0.009 0.023 0.014 0.052 0.038 0
MSFT 0.024 0.013 0.012 0.012 0.031 0.012 0.035 0.017 0 0
BA 0 0.005 0.005 0.002 0.004 0.009 0.032 0.036 0.046 0
MMM 0 0.008 0.008 0.026 0.007 0.009 0.005 0.044 0.026 0
DIS 0 0.003 0.003 0.006 0.026 0.012 0.025 0.035 0.014 0
MRK 0 0.014 0.006 0.019 0.012 0.009 0.012 0.022 0.023 0
GS 0 0.013 0.01 0.015 0.012 0.009 0.006 0.024 0.02 0
CAT 0 0.002 0.009 0.004 0.009 0.011 0.018 0.013 0.04 0
NKE 0.015 0.009 0.005 0.01 0.01 0.023 0.017 0.007 0 0
XOM 0 0.031 0.008 0.018 0.003 0.007 0.012 0.015 0 0
AAPL 0 0.004 0.003 0.001 0.004 0.003 0.027 0.017 0.004 0
TRV 0 0.005 0.006 0.002 0.005 0.012 0.007 0.012 0.016 0
CSCO 0.004 0.002 0.004 0.002 0.006 0.012 0.003 0.01 0 0
IBM 0 0.004 0.004 0.003 0.001 0.011 0.001 0.001 0.005 0

FREE Calcbench Premium
Two Week Trial

Research Financial & Accounting Data Like Never Before. More features and try our Excel add-in. Sign up now to try the Premium Suite.