RECENT POSTS
Wednesday, October 9, 2019
U.S. firms with Sales in China through 2018.

Wednesday, October 9, 2019
Tracking  Pension Data in Calcbench

Friday, October 4, 2019
In Depth: Leasing Costs in Retail Sector

Thursday, September 19, 2019
Alibaba and Cloud Computing

Monday, September 16, 2019
Introducing Critical Audit Matters

Wednesday, September 11, 2019
Our Fireside Chat on Goodwill Assets

Friday, September 6, 2019
Pulling Forward Share Buybacks

Saturday, August 31, 2019
A Quick Catch-Up on VMWare

Friday, August 23, 2019
By the Numbers: Restructuring Costs Over Time

Wednesday, August 21, 2019
WeWork Liabilities, Part II

Tuesday, August 20, 2019
WeWork’s Liabilities in Perspective

Wednesday, August 14, 2019
Comparing LinkedIn, Twitter Revenue

Wednesday, August 7, 2019
Leasing’s Effect on Retail Balance Sheets

Thursday, August 1, 2019
Using Calcbench to Find China Exposure

Tuesday, July 30, 2019
Leasing Details: The Comcast Example

Monday, July 29, 2019
Easy Fundamental Equity Analysis in Python

Monday, July 22, 2019
Calcbench Data and Tax Reform Insight

Wednesday, July 17, 2019
Downshifting in the Trucking World

Tuesday, July 16, 2019
New Report: Adoption of New Lease Accounting Standard

Friday, July 5, 2019
More Consequences of Lease Accounting

Archive  |  Search:
10-K/Q Section Text Change Detection
Tuesday, October 30, 2018

The Code

Goal

Reduce the amount of time analysts spend reading 10-K/Qs by highlighting the sections which change the most between periods.

Hypothesis

The cosine distance between Term Frequency - Inverse Document Frequencey (TF-IDF) vectors of 10-K sections is a useful proxy for semantic change in 10-K sections across time.

Procedure

  1. Use the Calcbench Python API Client to download the Risk Factors section of the 10-K from Calcbench
  2. Tokenize the sections
  3. Build TF-IDF matrices
  4. Compute the cosine distance between each section and the same section from the previous filing/period
  5. Render the matrix of distances with largest distances highlighted.
  6. Review large changes by “diffing” documents with distance above a certain threshold.

Highlight Risk Factors with Greatest Change

Brightest cells are those documents which changed the most vis-a-vis the previous period.
2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
JNJ 0 0.01 0.607 0.02 0.57 0 0.042 0 0.026 0
JPM 0 0.55 0.008 0.008 0.013 0.023 0.013 0.028 0.427 0
WBA 0.007 0.014 0.017 0.187 0 0.063 0.244 0.099 0 0
DWDP 0 0.24 0.135 0.045 0.033 0.02 0.033 0.01 0.068 0
V 0 0.034 0.153 0.066 0.007 0.027 0.021 0.016 0.097 0
MCD 0 0.014 0.019 0.015 0.178 0.045 0.038 0.04 0.051 0
VZ 0 0.012 0.061 0.053 0.049 0.097 0.035 0.029 0.063 0
WMT 0.061 0.056 0.025 0.084 0.03 0.076 0.02 0.024 0 0
PFE 0 0.015 0.074 0.063 0.034 0.02 0.039 0.047 0.044 0
PG 0.026 0.02 0.02 0.069 0.022 0.018 0.102 0.026 0 0
UTX 0 0.022 0.006 0.031 0.009 0.04 0.033 0.072 0.069 0
HD 0 0.037 0.018 0.03 0.12 0.017 0.012 0.02 0.025 0
CVX 0 0.013 0.028 0.045 0.02 0.001 0.005 0.075 0.057 0
INTC 0 0 0.011 0.002 0.068 0.018 0.032 0.094 0.017 0
AXP 0 0.009 0.015 0.032 0.016 0.031 0.025 0.022 0.059 0
KO 0 0.013 0.007 0.014 0.012 0.011 0.037 0.035 0.065 0
UNH 0 0.008 0.006 0.006 0.009 0.023 0.014 0.052 0.038 0
MSFT 0.024 0.013 0.012 0.012 0.031 0.012 0.035 0.017 0 0
BA 0 0.005 0.005 0.002 0.004 0.009 0.032 0.036 0.046 0
MMM 0 0.008 0.008 0.026 0.007 0.009 0.005 0.044 0.026 0
DIS 0 0.003 0.003 0.006 0.026 0.012 0.025 0.035 0.014 0
MRK 0 0.014 0.006 0.019 0.012 0.009 0.012 0.022 0.023 0
GS 0 0.013 0.01 0.015 0.012 0.009 0.006 0.024 0.02 0
CAT 0 0.002 0.009 0.004 0.009 0.011 0.018 0.013 0.04 0
NKE 0.015 0.009 0.005 0.01 0.01 0.023 0.017 0.007 0 0
XOM 0 0.031 0.008 0.018 0.003 0.007 0.012 0.015 0 0
AAPL 0 0.004 0.003 0.001 0.004 0.003 0.027 0.017 0.004 0
TRV 0 0.005 0.006 0.002 0.005 0.012 0.007 0.012 0.016 0
CSCO 0.004 0.002 0.004 0.002 0.006 0.012 0.003 0.01 0 0
IBM 0 0.004 0.004 0.003 0.001 0.011 0.001 0.001 0.005 0

FREE Calcbench Premium
Two Week Trial

Research Financial & Accounting Data Like Never Before. More features and try our Excel add-in. Sign up now to try the Premium Suite.