RECENT POSTS
Monday, May 20, 2019
Research Paper: Capex Spending

Thursday, May 16, 2019
Psst: Got Any Weed?

Wednesday, May 15, 2019
Open Letter: SEC Proposed Rule for BDCs

Friday, May 10, 2019
General Motors and Workhorse

Monday, May 6, 2019
How to Find Earnings Release Data

Tuesday, April 23, 2019
Following Restructuring Costs Over Time

Monday, April 22, 2019
Capex Spending: More Than You Might Think

Saturday, April 13, 2019
When AWS Takes Over the World

Thursday, April 11, 2019
Data Trends in Focus: Restructuring Costs

Sunday, April 7, 2019
How One Customer Crushed It With Calcbench

Thursday, April 4, 2019
TJX Shows Complexity of Leasing Costs Reporting

Tuesday, April 2, 2019
CEO Pay Ratios: Some 2018 Thoughts

Wednesday, March 27, 2019
Corporate Spending: Where It Goes, 2017 vs. 2018

Monday, March 25, 2019
Health Insurers: A Bit Winded?

Friday, March 22, 2019
Our New Master Class Video

Thursday, March 21, 2019
Tech Data’s Goodwill Adjustment

Tuesday, March 19, 2019
There’s Taxes, and There’s Taxes

Saturday, March 16, 2019
Adventures in Tax Cuts and Net Income

Monday, March 11, 2019
Big Moves in Goodwill, Intangible Value

Friday, March 8, 2019
CVS, Goodwill, and Enterprise Value

Archive  |  Search:
10-K/Q Section Text Change Detection
Tuesday, October 30, 2018

The Code

Goal

Reduce the amount of time analysts spend reading 10-K/Qs by highlighting the sections which change the most between periods.

Hypothesis

The cosine distance between Term Frequency - Inverse Document Frequencey (TF-IDF) vectors of 10-K sections is a useful proxy for semantic change in 10-K sections across time.

Procedure

  1. Use the Calcbench Python API Client to download the Risk Factors section of the 10-K from Calcbench
  2. Tokenize the sections
  3. Build TF-IDF matrices
  4. Compute the cosine distance between each section and the same section from the previous filing/period
  5. Render the matrix of distances with largest distances highlighted.
  6. Review large changes by “diffing” documents with distance above a certain threshold.

Highlight Risk Factors with Greatest Change

Brightest cells are those documents which changed the most vis-a-vis the previous period.
2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
JNJ 0 0.01 0.607 0.02 0.57 0 0.042 0 0.026 0
JPM 0 0.55 0.008 0.008 0.013 0.023 0.013 0.028 0.427 0
WBA 0.007 0.014 0.017 0.187 0 0.063 0.244 0.099 0 0
DWDP 0 0.24 0.135 0.045 0.033 0.02 0.033 0.01 0.068 0
V 0 0.034 0.153 0.066 0.007 0.027 0.021 0.016 0.097 0
MCD 0 0.014 0.019 0.015 0.178 0.045 0.038 0.04 0.051 0
VZ 0 0.012 0.061 0.053 0.049 0.097 0.035 0.029 0.063 0
WMT 0.061 0.056 0.025 0.084 0.03 0.076 0.02 0.024 0 0
PFE 0 0.015 0.074 0.063 0.034 0.02 0.039 0.047 0.044 0
PG 0.026 0.02 0.02 0.069 0.022 0.018 0.102 0.026 0 0
UTX 0 0.022 0.006 0.031 0.009 0.04 0.033 0.072 0.069 0
HD 0 0.037 0.018 0.03 0.12 0.017 0.012 0.02 0.025 0
CVX 0 0.013 0.028 0.045 0.02 0.001 0.005 0.075 0.057 0
INTC 0 0 0.011 0.002 0.068 0.018 0.032 0.094 0.017 0
AXP 0 0.009 0.015 0.032 0.016 0.031 0.025 0.022 0.059 0
KO 0 0.013 0.007 0.014 0.012 0.011 0.037 0.035 0.065 0
UNH 0 0.008 0.006 0.006 0.009 0.023 0.014 0.052 0.038 0
MSFT 0.024 0.013 0.012 0.012 0.031 0.012 0.035 0.017 0 0
BA 0 0.005 0.005 0.002 0.004 0.009 0.032 0.036 0.046 0
MMM 0 0.008 0.008 0.026 0.007 0.009 0.005 0.044 0.026 0
DIS 0 0.003 0.003 0.006 0.026 0.012 0.025 0.035 0.014 0
MRK 0 0.014 0.006 0.019 0.012 0.009 0.012 0.022 0.023 0
GS 0 0.013 0.01 0.015 0.012 0.009 0.006 0.024 0.02 0
CAT 0 0.002 0.009 0.004 0.009 0.011 0.018 0.013 0.04 0
NKE 0.015 0.009 0.005 0.01 0.01 0.023 0.017 0.007 0 0
XOM 0 0.031 0.008 0.018 0.003 0.007 0.012 0.015 0 0
AAPL 0 0.004 0.003 0.001 0.004 0.003 0.027 0.017 0.004 0
TRV 0 0.005 0.006 0.002 0.005 0.012 0.007 0.012 0.016 0
CSCO 0.004 0.002 0.004 0.002 0.006 0.012 0.003 0.01 0 0
IBM 0 0.004 0.004 0.003 0.001 0.011 0.001 0.001 0.005 0

FREE Calcbench Premium
Two Week Trial

Research Financial & Accounting Data Like Never Before. More features and try our Excel add-in. Sign up now to try the Premium Suite.