RECENT POSTS
Archive  |  Search:
10-K/Q Section Text Change Detection
Tuesday, October 30, 2018

The Code

Goal

Reduce the amount of time analysts spend reading 10-K/Qs by highlighting the sections which change the most between periods.

Hypothesis

The cosine distance between Term Frequency - Inverse Document Frequencey (TF-IDF) vectors of 10-K sections is a useful proxy for semantic change in 10-K sections across time.

Procedure

  1. Use the Calcbench Python API Client to download the Risk Factors section of the 10-K from Calcbench
  2. Tokenize the sections
  3. Build TF-IDF matrices
  4. Compute the cosine distance between each section and the same section from the previous filing/period
  5. Render the matrix of distances with largest distances highlighted.
  6. Review large changes by “diffing” documents with distance above a certain threshold.

Highlight Risk Factors with Greatest Change

Brightest cells are those documents which changed the most vis-a-vis the previous period.
2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
JNJ 0 0.01 0.607 0.02 0.57 0 0.042 0 0.026 0
JPM 0 0.55 0.008 0.008 0.013 0.023 0.013 0.028 0.427 0
WBA 0.007 0.014 0.017 0.187 0 0.063 0.244 0.099 0 0
DWDP 0 0.24 0.135 0.045 0.033 0.02 0.033 0.01 0.068 0
V 0 0.034 0.153 0.066 0.007 0.027 0.021 0.016 0.097 0
MCD 0 0.014 0.019 0.015 0.178 0.045 0.038 0.04 0.051 0
VZ 0 0.012 0.061 0.053 0.049 0.097 0.035 0.029 0.063 0
WMT 0.061 0.056 0.025 0.084 0.03 0.076 0.02 0.024 0 0
PFE 0 0.015 0.074 0.063 0.034 0.02 0.039 0.047 0.044 0
PG 0.026 0.02 0.02 0.069 0.022 0.018 0.102 0.026 0 0
UTX 0 0.022 0.006 0.031 0.009 0.04 0.033 0.072 0.069 0
HD 0 0.037 0.018 0.03 0.12 0.017 0.012 0.02 0.025 0
CVX 0 0.013 0.028 0.045 0.02 0.001 0.005 0.075 0.057 0
INTC 0 0 0.011 0.002 0.068 0.018 0.032 0.094 0.017 0
AXP 0 0.009 0.015 0.032 0.016 0.031 0.025 0.022 0.059 0
KO 0 0.013 0.007 0.014 0.012 0.011 0.037 0.035 0.065 0
UNH 0 0.008 0.006 0.006 0.009 0.023 0.014 0.052 0.038 0
MSFT 0.024 0.013 0.012 0.012 0.031 0.012 0.035 0.017 0 0
BA 0 0.005 0.005 0.002 0.004 0.009 0.032 0.036 0.046 0
MMM 0 0.008 0.008 0.026 0.007 0.009 0.005 0.044 0.026 0
DIS 0 0.003 0.003 0.006 0.026 0.012 0.025 0.035 0.014 0
MRK 0 0.014 0.006 0.019 0.012 0.009 0.012 0.022 0.023 0
GS 0 0.013 0.01 0.015 0.012 0.009 0.006 0.024 0.02 0
CAT 0 0.002 0.009 0.004 0.009 0.011 0.018 0.013 0.04 0
NKE 0.015 0.009 0.005 0.01 0.01 0.023 0.017 0.007 0 0
XOM 0 0.031 0.008 0.018 0.003 0.007 0.012 0.015 0 0
AAPL 0 0.004 0.003 0.001 0.004 0.003 0.027 0.017 0.004 0
TRV 0 0.005 0.006 0.002 0.005 0.012 0.007 0.012 0.016 0
CSCO 0.004 0.002 0.004 0.002 0.006 0.012 0.003 0.01 0 0
IBM 0 0.004 0.004 0.003 0.001 0.011 0.001 0.001 0.005 0

FREE Calcbench Premium
Two Week Trial

Research Financial & Accounting Data Like Never Before. More features and try our Excel add-in. Sign up now to try the Premium Suite.