[HN Gopher] Launch HN: Bedrock AI (YC S21) - Using ML to identif...
       ___________________________________________________________________
        
       Launch HN: Bedrock AI (YC S21) - Using ML to identify red flags in
       SEC filings
        
       We're Kris, Suhas, and Heather (YCS21) and we're building Bedrock
       AI (https://bedrock-ai.com/). We use machine learning to extract
       hard-to-find information and assess risk in public company reports
       (SEC filings). Our platform is used by investors to improve
       portfolio returns and mitigate downside risks.  Most public company
       data is unstructured and textual. Because relevant information is
       hard to find, a lot of corporate data is radically underused, to
       the detriment of investors. For example, our research shows it can
       take 12-18 months for corporate malfeasance to be incorporated into
       stock price after clear warning signs appear in financial text.
       Hard-to-find information that we extract includes accounting and
       governance choices, product defects, regulatory issues,
       customer/market reliance and much more.  One example is Sino-
       Forest, Canada's Enron. Sino-Forest was a darling of Canadian
       investors until an infamous expose, by short-seller Muddy Waters,
       in 2011. It turned out it was a forestry company that didn't
       actually own any forests. Months before the expose and crash, there
       were obvious red flags in the company's disclosures including
       buying and selling from companies controlled by their directors and
       problems with the review of their bookkeeping! Our algorithms
       picked up these red flags and more, and assessed Sino-Forest as
       high risk when we ran our models on the company's historical
       filings.  I'm a CPA and a developer (odd combo). The tech community
       has largely ignored public company financial disclosure. A few
       years ago, I published a basic piece using computational methods to
       analyze cannabis disclosure. The local regulatory agency contacted
       me to give them a workshop on text analytics. It was then that it
       hit home how little was being done in the field.  Information
       drives financial markets. The difficulty of assessing risks hidden
       in long public filings makes earning manipulation, and even fraud,
       both possible and profitable. Earnings manipulation involves using
       the flexibility in accounting standards to make financial
       statements look better than reality. This is easier than most
       people realize because accounting involves MANY choices and
       estimates.  There is money to be made by accessing and trading on
       underused predictive signals. Making money by stopping fraud is a
       win-win situation.  There are two main technical challenges
       thwarting progress in the field: (1) NLP models work best on short
       (500 character) text, but financial filings are hundreds of pages
       long, and (2) important and unimportant language sounds very
       similar in financial text. For instance, this sentence sounds like
       it could be indicative of terrible things going on behind the
       scenes but is in fact, just boilerplate disclosure: "We face risks
       and uncertainties related to litigation, regulatory actions and
       government investigations and inquiries." You can see how ML models
       easily get confused.  There's a big gap in both academia and
       industry. A lot of effort is being put into forcing results from
       non-existent linguistic signals. Models that claim to predict
       specific outcomes often don't hold up to scrutiny in practice.  In
       order to overcome the technical challenges we used supervised and
       semi-supervised learning with high quality labels, we focused on
       tangible facts represented in textual context, and we adapted
       language models using domain expertise.  As far as we know, no
       other solution is able to identify problematic/risky disclosure
       algorithmically. Using search terms to do something similar results
       in overwhelming noise. The disclosure selected by our algorithms is
       highly predictive of downside risk - validated in deployment and
       also in backtesting.  We launched our core product in April 2021
       (see https://bedrock-ai.com) and it's used by hedge funds and
       institutional investors. We're also doing a pilot to support
       Canadian securities regulators (https://bit.ly/3wOwOj6). We've also
       just launched a minimalist free site, Ledge (https://ledge.bedrock-
       ai.com), to help retail investors stay up to date on material
       events at companies they follow. Companies are required to disclose
       material events between their quarterly reports, but these
       disclosures rarely make the news.  Our core/premium product is
       currently only available to institutions, in part because retail
       investors generally don't prioritize risk management and therefore
       aren't committed customers. We plan to expand the free site and
       better support individual investors as we grow.  We would love to
       hear from you. Have you tried to read annual reports and gotten
       lost in the weeds? What has your experience been in making NLP
       models work on financial text?
        
       Author : kbennatti
       Score  : 104 points
       Date   : 2021-07-20 13:00 UTC (10 hours ago)
        
       | mynegation wrote:
       | How would you address would-be filers from using your own product
       | from iterating on their wording until the red flags are removed?
        
         | kbennatti wrote:
         | We don't sell to filers. Ever. Got to protect against conflicts
         | of interest. That's one of the reasons we don't have a true
         | free version of the product.
         | 
         | That said, we're using language models so replacing words with
         | synonyms won't evade the model as long as its expressing the
         | same thing.
        
           | DSingularity wrote:
           | Do you have any deeper protections than simply not selling to
           | filers? It doesn't seem so hard for a motivated filer to
           | circumvent by using friendly hedgefunds to lend their
           | licenses.
        
             | piesauce wrote:
             | We constantly retrain our red flag models and they are
             | tested for robustness (whichever way the company decides to
             | express the existence of a risk, like say an investigation,
             | we ensure that the models pick it up)
        
         | htrp wrote:
         | Realistically they probably don't need to do that. The filings
         | are usually pretty heavily lawyered up and the lawyers are
         | pretty lazy on language updates.
        
           | kbennatti wrote:
           | Yeah, people have been doing sentiment analysis of press
           | releases etc for years now and very few corporates bother
           | trying to use software to test/counteract it
        
             | hogFeast wrote:
             | Companies are already doing this. IR consultants have,
             | allegedly, been advising exec teams to use/not use certain
             | words based on their impact on AI models. It is inevitable.
             | 
             | SEC filings are somewhat more robust because they are a
             | standardized format but the point of many of these frauds,
             | for example SinoForest, was about things which weren't in
             | the document not the things that were (the stuff that was
             | in documents was never a smoking gun...iirc, I remember
             | looking at the stock before it happened...a lot of these
             | Chinese frauds had totally fine numbers though, that was
             | the point).
        
       | vmception wrote:
       | Given your example boilerplate disclosure, is one of your key
       | conditional statements simply categorizing boilerplate? Like
       | instead of simply putting a sentiment score on the wording, you
       | first isolate it as boilerplate or as a unique potential
       | aberration and then assign weights?
        
         | kbennatti wrote:
         | You've hit the nail on the head (mostly)...but I'm going to say
         | no more because its part of our secret sauce
        
           | vmception wrote:
           | Yes, I figured! I found your synopsis to be inspirational
           | 
           | An additional tool I always wanted was to match external -
           | even macroeconomic - events to company risk factors
        
       | andreabee wrote:
       | Congratulations! I played with that database in 2017 when they
       | opened up the text. I'm so happy to see it's finally been picked
       | up.
       | 
       | I built a thing which recreated the Income statements and then
       | flagged for non-conformance. I found Babcock & Wilcox Enterprises
       | "Good will impairment charges" as an anomalous line in their Nov
       | 8, 2017 filing just prior to when their CEO resigned in 2018 and
       | they had to make so many adjustments. Unfortunately, I'm just a
       | data geek. I did have a financial advisor friend who was VERY
       | interested, but we couldn't get enough interest to validate
       | development, so I moved onto other projects.
       | 
       | I'm no finance whiz, but that database is a trove. So happy to
       | see a company develop around giving it a go!
        
         | tomrod wrote:
         | Which database?
        
           | piesauce wrote:
           | I assume they are talking about EDGAR -
           | https://www.sec.gov/edgar/search-and-access
        
         | piesauce wrote:
         | Thanks for the wishes. That's really cool that you were able to
         | do that!
        
         | kbennatti wrote:
         | Seconding Suhas here. Super rad
        
       | nodesocket wrote:
       | Do you have any plans to take S-1 filings and make "cliff note"
       | versions, i.e. distill down to core financials and important
       | data?
        
         | piesauce wrote:
         | Yes, this is at the core of what we do - taking a document with
         | 2000 sentences like an annual report and distilling it down to
         | the 20-30 sentences that actually matter. We do have plans to
         | process S-1 filings, and we are currently working on building
         | abstractive summarization capabilities.
        
       | jpkotyla wrote:
       | Quant with some NLP experience here. Impressive business traction
       | so quickly, good stuff. Who do you see as competitors in this
       | space? I'm wondering whether you think that footnoted*
       | (https://footnoted.com) offers something similar, or might your
       | products be complimentary in some way? Thanks.
        
         | kbennatti wrote:
         | We love footnoted! Michelle Leder of footnoted has been really
         | supportive of our work. She recognizes that it's an important
         | area for tech innovation. footnoted isn't operating right now
         | but when it comes back, we'll support her and collaborate if we
         | can.
         | 
         | We're often compared to AlphaSense, Sentieo and InsiderScore
         | but our product is pretty different. Competitor products focus
         | on sentiment and linguistic metrics or search etc, not on
         | extracting and organizing important textual content.
        
       | polpenn wrote:
       | Great work! I work in financial services and built a similar tool
       | for sentiment analysis and topic modelling on transcribed
       | earnings calls. The idea was to identify topics at a speaker-
       | level (analyst, management, etc.) and evaluate the sentiment
       | around that topic. For example, perhaps "foreign exchange" was
       | discussed negatively by a given company in an earnings call,
       | which would alert the analyst to review that call in greater
       | detail.
       | 
       | Are you guys thinking about incorporating something like this
       | into your product?
        
         | piesauce wrote:
         | Thanks! Earnings transcripts sentiment analysis is not
         | currently in our product roadmap. One of the reasons is that
         | such tools are already available on the market. We wanted to
         | explore more underutilized sources of information.
        
       | paisible wrote:
       | Congrats on the launch guys! Fellow Canadian from Montreal here
       | :) I'm curious, what tools / methodology did you follow to
       | generate the high-quality labels, and how many different labels
       | did you end generating? I'm also very curious whether you view
       | the discovery and generation of new labels (and accompanying
       | high-quality training datasets) as a continuing and core part of
       | your development going forward?
        
         | kbennatti wrote:
         | Ooh I love MTL. That's the first place I lived in Canada. Great
         | q. We used SEC enforcement actions related to fraud as our gold
         | label (fairly common practice in academia). Really key thing
         | here is that you need to be careful about what years you're
         | using for training because if you include years that are too
         | late in the fraud cycle, you end up with significant target
         | leakage e.g. the filings we'll say, "we're being investigated
         | for fraud". We ended up manually reviewing all of our
         | data/labels. It took over a month. We also use settled class
         | action lawsuits as silver labels. Plus a few other more
         | frequent labels as bronze labels
        
           | paisible wrote:
           | Thanks for the quick answer! Follow-up around this as it's a
           | space I'm actively working in: did you use or build any tools
           | for the labeling process, or was it Excel? :D Also, do you
           | ultimately see/position your solution as an AI-powered
           | exploration tool that allows humans to derive better
           | insights, faster (but where the NLP side of things is simply
           | to assist in this discovery process), or do you see the
           | models (and resulting flags) eventually being able to
           | completely replace the human intuition?
        
             | piesauce wrote:
             | Our annotation process has been manual so far, but we are
             | working on building something to make it more efficient :-)
             | We see our solution as an AI-powered assistant for
             | qualitative research that makes the job of an analyst much
             | easier, and don't see it as 'replacing' humans for the
             | foreseeable future.
        
       | [deleted]
        
       | ZeroCool2u wrote:
       | I've worked with 10-K's and 8-K's extensively for the purposes of
       | using them for NLP. This is extremely arduous work and a clear
       | winner in terms of profitable ideas, so kudos to the team for the
       | launch, this is really impressive.
       | 
       | Perhaps this is giving a bit too much away in terms of the secret
       | sauce, but would love if you could talk a bit about how you
       | handle the wild disparities in the structure of the documents. Do
       | you parse the XBRL?
        
         | piesauce wrote:
         | Thanks for the kind words! We don't use XBRL at all. We did try
         | it initially, but it was wildly inconsistent across companies.
         | I think one of the things that worked well for us was that we
         | spent a lot of time at the initial stages of the pipeline
         | (efficient sentence and word tokenization, span detection),
         | that bode well for our models later on.
        
           | ZeroCool2u wrote:
           | Thanks! This is similar to where I ended up landing as well.
           | It turns out using a non-standardized standard format is
           | practically worse than dealing with giant blobs of plain
           | text!
        
             | kbennatti wrote:
             | So true
        
       | fritzi wrote:
       | There have been several popular books about famous fraudsters. I
       | suspect that you come across some facts that would be interesting
       | (if not profitable) not just to institutional investors but to
       | the average nerd, or maybe someone looking for an idea for the
       | next bestseller.
        
         | bambax wrote:
         | "Lying For Money" by Dan Davis has several chapters about
         | financial fraud. It's a great read.
        
         | kbennatti wrote:
         | Like this? This Oxford City Football team, a boiler room
         | operation, duped people by saying that they were bound by a
         | voice recognition and proved it by using the phone to make some
         | beeping sounds.
         | https://www.sec.gov/litigation/litreleases/2017/lr23869.htm
        
         | kbennatti wrote:
         | Or this? John Rohner, claimed that he'd developed tested and
         | patented a "plasma engine" fueled by inexpensive and abundant
         | noble gasses. He also claiming that he graduated from Harvard
         | at 14 and had 3 Phds from M.I.T. Somehow he managed to dupe 98+
         | investors
        
       | MattGaiser wrote:
       | Former equity analyst intern and now a software engineer here. My
       | Dad also works in the investment world (and is a CPA and a
       | developer coincidentally) and after Sino Forest happened, he
       | wanted someone to parse annual reports and AIFs create a "weasel
       | word index." Ever thought of doing that?
       | 
       | Basically rank companies in estimated honesty by the language
       | they choose to use.
        
         | kbennatti wrote:
         | Aw cool! Can I hang out with your Dad? ;) We do pick up on
         | overly promotional/jargon-y language to some extent. For the
         | most part, however, word lists haven't worked for us.
        
       | scottydelta wrote:
       | Hey guys, congrats on the Launch. Is there any API to get the
       | metrics you guys calculate on SEC Filings?
       | 
       | I recently launched https://quantale.io which is a web-based
       | Bloomberg Terminal alternative and we monitor SEC filings in
       | real-time to show them to users[1].
       | 
       | It would be great to show additional data related to the SEC
       | filings if there was an API.
       | 
       | [1] https://quantale.io/dashboard/sec-filings
        
         | nodesocket wrote:
         | How are you different than the native macOS WeBull application?
         | WeBull comes the closest I have seen to a Bloomberg terminal.
        
           | scottydelta wrote:
           | WeBull only provides financial and newsmedia data about the
           | stocks but Quantale is the first of it's kind to provide not
           | only Financial and newsmedia data but also real-time data
           | from SEC, Reddit, and Twitter.
           | 
           | Along with that, additional features include:
           | 
           | - Top trending stocks from the internet in real-time
           | 
           | - Sentiment of the discussions using a pytorch model.
           | 
           | - Ability to save posts(reddit, twitter, news headline, sec
           | filings)
           | 
           | - Ability to create watchlist to watch and monitor a group of
           | tickers like SPACs.
           | 
           | Features in Roadmap:
           | 
           | - Alerts on change in the activity of stocks
           | 
           | - Level 2 Data
           | 
           | - Options Data
           | 
           | - Brokerage so that users can trade directly from Quantale
           | 
           | Please try it out at https://quantale.io and any feedback is
           | appreciated. Feel free to reach me at vikash@quantale.io
        
         | kbennatti wrote:
         | Sweet! The world needs a Bloomberg alternative so props. We
         | don't have an API atm. We're focused on supporting human
         | analysts. The meat of the product is the red flags which are
         | textual and not quantitative.
        
           | scottydelta wrote:
           | If you think you could use more data such as news headlines,
           | reddit and twitter chatter, and forum discussions about the
           | companies from the filings then let me know, Quantale can
           | provide that. The additional data could augment the current
           | fraud detection.
        
             | kbennatti wrote:
             | Interesting. Potentially? How do we get in touch?
        
               | scottydelta wrote:
               | email me at vikash@quantale.io
               | 
               | thank you
        
       | bambax wrote:
       | > _Most public company data is unstructured and textual._
       | 
       | This is surprising to me.
       | 
       | Wouldn't it be possible to make a very short list of, say, 12
       | blunt questions that would help flag fraud?
       | 
       | Of course the company could be lying in the answers. (If it was
       | found to be lying in any one answer then it would automatically
       | be flagged as "high risk".)
       | 
       | But this may not even be needed, since what you seem to be saying
       | is that the information about fraud is there in plain sight, yet
       | hard to see, because it is drowned in fluff and periphrases.
       | 
       | Is there an opportunity for a private company, say a rating
       | company, to make and distribute such a questionnaire? Or does it
       | already exist in some form?
        
         | HeatherJudd wrote:
         | An interesting idea. I'm not sure if say 12 or so questions
         | could cover the risks, or it if would just add one more data
         | point for people to (not) read?
         | 
         | There are certainly some sorts of transactions that are much
         | more risky than others - so having an easy source for these
         | transactions would be useful (and we'd built it into our
         | models). But often the signals are more subtle. Companies with
         | overly aggressive accounting policies across the board tend to
         | have completely different undisclosed problems. Since there are
         | estimates and judgement involved in all areas of accounting
         | looking at the aggregate impact of all policies can be
         | important.
         | 
         | On adding more information - The SEC adopted rules to modernize
         | disclosures of risk factors in 2020, requiring a summary risk
         | factor disclosure if the risk factor section exceeds 15 pages
         | (https://www.sec.gov/news/press-release/2020-192).
        
       | hbcondo714 wrote:
       | Congrats! Did you by chance work with the YC team that launched
       | MarketBrief? I don't think they are around anymore but they did
       | SEC Filings too, albeit 10 years ago:
       | 
       | https://techcrunch.com/2011/08/15/yc-funded-marketbrief-make...
       | 
       | Disclosure / Shameless Plug: I work on https://last10k.com ...a
       | consumer offering for reading 10K/Q reports more efficiently
        
         | piesauce wrote:
         | Thanks for the wishes! I have never heard of MarketBrief, this
         | is very interesting.
         | 
         | We most probably couldn't have done what we are able to do
         | today back then because the ML scene was quite nascent.
        
       | Invictus0 wrote:
       | Does your model flag NKLA? Nikola Motors, ostensibly a fuel cell
       | vehicle company, famously reported $36,000 in "solar revenue"
       | (they got a contract to put solar panels on someone's house)
       | while simultaneously having a market cap of over $4 billion.
       | 
       | If so, can you comment on what your model considers red flags in
       | Nikola's SEC filings?
       | 
       | https://hindenburgresearch.com/nikola/
       | 
       | https://sec.report/Document/0001731289-20-000012/
        
         | kbennatti wrote:
         | Yes, it did. BUT here's the thing -> Nikola was first a SPAC
         | and by the time it filed its first filing as Nikola, the
         | Hidenburg report was already out so its not a true example of
         | our algorithms preceding short reports. Our algos did beat
         | Hidenburg to the punch on a number of other companies though.
         | 
         | A bit outdated but check this out -
         | https://bedrock.substack.com/p/bedrock-ai-vs-activist-shorts
        
           | kbennatti wrote:
           | Nikola red flags for you! (all algorithmic)
           | 
           | 1. "For example, in September 2020, our founder and former
           | executive chairman, Trevor R. Milton, stepped down from his
           | positions with us."
           | 
           | 2. "During the fourth quarter of 2020, the Company ceased
           | operations related to the Powersports business unit in order
           | to focus on the Company's primary mission of commercial
           | production of semi-trucks and construction of hydrogen
           | fueling stations."
           | 
           | 3. "As of December 31, 2020, we have $46.3 million of prepaid
           | in-kind advisory services remaining which is expected to be
           | consumed in 2021 and will be recorded as research and
           | development expense until we reach commercial production."
        
             | kbennatti wrote:
             | in-kind services is always a red flag because it's easy to
             | fudge the accounting
        
             | kbennatti wrote:
             | just a sample
        
       | Seabiscuit wrote:
       | Interesting site. I currently work at a hedge fund, but have a
       | small dose of NLP in my academic background, so it's always
       | interesting to see concepts like this come out.
       | 
       | Two questions: - Are you using EDGAR's 'Facts' function? It seems
       | to make SEC Filings a lot more like structured text than they
       | have been previously, but I haven't seen really convincing tools
       | developed to use it yet
       | 
       | - How/do you ever see yourself interfacing with similar 'red
       | flag' screening tools that just work on the numerical side i.e.
       | accounting ratios ?
       | 
       | Also, you've got a grammatical error on your Values and Vision
       | page. Normally I wouldn't comment to point that kind of thing
       | out, but for an NLP startup it seems more appropriate ('its
       | volume' not 'it's volume')!
        
         | kbennatti wrote:
         | Thanks for website edit! Fixed.
         | 
         | We don't rely on XBRL for parsing. It's not very
         | consistent/reliable and its mostly for numeric content. We've
         | definitely considered integrating ratios both into our
         | dashboard. It isn't a current priority because ratios are
         | already well supported elsewhere.
        
           | somberi wrote:
           | Asking to learn - ex-Algo here. May I ask where else ratios
           | are represented. Thanks and all the best for your launch.
           | Very useful service.
           | 
           | From experience, I would suggest you have way for the manager
           | of a fund or a desk or a bank, to see the usefulness. You
           | will have good pull from the line staff, but selling to the
           | managers is the hard part.
        
             | piesauce wrote:
             | Thanks for your wishes! For ratios, We love Alpha Vantage.
             | They are great value for money.
        
       | danielmarkbruce wrote:
       | Out of interest - why didn't you start a hedge fund instead?
        
       | mgl wrote:
       | Really interesting and useful if works.
       | 
       | Do you consider validating financial figures against Benford's
       | Law distribution? Whilst it is not NLP I am curious does it still
       | work?
        
       | c3pa wrote:
       | > Our algorithms picked up these red flags and more, and assessed
       | Sino-Forest as high risk when we ran our models on the company's
       | historical filings.
       | 
       | When you're backtesting your models, how do you distinguish
       | between novel fraud that the industry is _now_ aware of vs. fraud
       | that was visible but ignored -- if your model has learned _from_
       | Sino-forest, how do you know it would have caught Sino-Forest at
       | the time?
       | 
       | > For instance, this sentence sounds like it could be indicative
       | of terrible things going on behind the scenes but is in fact,
       | just boilerplate disclosure: "We face risks and uncertainties
       | related to litigation, regulatory actions and government
       | investigations and inquiries." You can see how ML models easily
       | get confused.
       | 
       | Humans too, you're describing at least one comment in every HN
       | thread about an SEC filing :D
        
         | kbennatti wrote:
         | > Humans too, you're describing at least one comment in every
         | HN thread about an SEC filing :D
         | 
         | LOL. Yup!
        
         | kbennatti wrote:
         | Great q. Sino-Forest is an out-of-sample test so our models
         | didn't technically "learn" from it. That said, very valid
         | comment. Historical testing only goes so far. Assessing whether
         | our algorithms work in deployment has been cool. Check out some
         | of our live, in deployment examples here -
         | https://bedrock.substack.com/p/bedrock-ai-vs-activist-shorts
        
           | naturalauction wrote:
           | >https://bedrock.substack.com/p/bedrock-ai-vs-activist-shorts
           | 
           | How often are companies rated with a risk factor this high.
           | As in does a risk factor in the 80s mean that fraud is
           | extremely likely or is it just notifying humans that this
           | filing might be worth reading over with a fine-toothed comb.
        
             | piesauce wrote:
             | Less than 10 percent of companies have a risk score above
             | 80. Our historical testing shows that around 1/3rd of
             | companies with scores above 80 turn out to be fraudulent.
             | (It is hard to test this so this might be an overestimate)
        
       | danicgross wrote:
       | Interesting.
       | 
       | How do you think about backtesting? There are a few short-only
       | shops that specialize in finding frauds. If you get their
       | historical 13-Fs, how would you score against them in terms of
       | precision/recall?
       | 
       | And I guess more broadly, how does alpha with your system compare
       | to a portfolio that holds all short positions by big long/short
       | funds (ex thematic shorts)? Meaning, those guys have full-time
       | humans that focus on this... can you beat them? Very interesting
       | if so.
        
         | kbennatti wrote:
         | RE: backtesting we use SEC enforcement actions related to fraud
         | (10b-5) as our gold label. That said, there is no gold label
         | for absence of fraud so our backtested metrics are probably
         | slightly better than they seem.
         | 
         | We've never tried to score a short fund on their
         | precision/recall but unofficially, Hindenburg Research has the
         | highest concordance with our models in deployment.
        
         | kbennatti wrote:
         | Re: alpha -> our focus is on extracting red flags that are
         | similar to what a forensic accountant/analyst would find. AI-
         | assisted research rather than AI driven. trading on fraud
         | signals alone is pretty hard, you need another event. we havent
         | done quant testing of our risk scores in a while. We definitely
         | should do proper quant backtesting though.
        
       | shum1 wrote:
       | As someone who has looked into alternative data business models
       | for the finance industry, this is really awesome to see someone
       | doing this as a company. I was interested to understand how you
       | think about your revenue model? I feel that if your data provides
       | alpha (i.e. selling before other people are aware of the
       | problematic disclosures), as your models become validated within
       | the industry, someone/some firm is going to use it to generate
       | alpha. But then you have a problem where, that one firm that
       | captures most of the value, and takes it from other participants
       | who now lose the value-add of your product.
       | 
       | How do you balance those two sides? I mean it as a potential
       | customer who would love to pay for your product, but want to
       | understand how you prevent this becoming a alpha-generating NLP
       | strategy for one hedge firm who pays the most for it.
        
         | HeatherJudd wrote:
         | It's definitely something we've discussed as a team. We would
         | like to help make fraud less profitable.
         | 
         | The current iteration of the product requires users with some
         | level of financial expertise - hence why we are starting with
         | fundamentals investors. We believe that the longer a fraud goes
         | on, the more people get hurt - so we want to bring these issue
         | to the forefront. Perhaps each trade can be considered a zero-
         | sum game, but long-term there is a benefit to all market
         | participants. We love the idea of every investor considering
         | how aggressive accounting/reporting informs management
         | integrity. Unfortunately, I think we have some time before this
         | becomes the norm.
        
       | tomrod wrote:
       | Very intriguing.
       | 
       | Can you give a sense of your model(s) metrics? Sensitivity and
       | specificity in validation/test? If you're open to it,
       | explainability assessments?
       | 
       | What is your market size? Can your models transfer to other
       | compliance spaces?
        
         | kbennatti wrote:
         | Our product has two very distinct outputs 1) red flags and 2) a
         | risk score. Our primary value-add is the red flags which are a
         | qualitative input into an analysts process. We find the
         | information you need to see without having to spend 3 hours
         | reading nonsense. The two components get tested very
         | differently. The red flags explain (sort of) the risk scores
         | but not in a truly explainable fashion. The risk scores are
         | optimized for precision and approximately 1 in 3 companies with
         | a "fraud" score will end up with an SEC investigation or
         | equivalent.
         | 
         | Yes! the NLP research we've done can be adapted to other tasks.
        
       | xgboosting wrote:
       | Hey there, I have some experience summarizing 10-K and 10-Q's for
       | a personal project, I clusterd BERT embeddings then selected the
       | clusters that had a high correlation with non standard price
       | deviation in the next 20 days. You can check out the results on
       | https://eclect.us/
       | 
       | I'd love to learn more about the methods you're using and I would
       | also like to know if you're ever looking to hire more developers.
        
         | piesauce wrote:
         | I had a look at your website and the related blog post. It
         | looks pretty cool! We are working on some abstractive
         | summarization projects but haven't deployed them yet. We are
         | not hiring right now, but please do send me a note at suhas @
         | the bedrock ai domain.
        
       | vmception wrote:
       | > For example, our research shows it can take 12-18 months for
       | corporate malfeasance to be incorporated into stock price after
       | clear warning signs appear in financial text.
       | 
       | One thing I noticed about "Efficient Market Theory" is that it is
       | unfalsifiable. It isn't a scientific theory and it also cannot be
       | proved true or false, only useful when convenient. It relies on
       | magic, rationality, and the assumption of omniscience by large
       | investment banks.
       | 
       | Nothing is priced in.
        
         | kbennatti wrote:
         | I love that. I agree until the "nothing" bit. Some things do
         | get priced in. You can see specific stocks "react" to news,
         | adjusted for market returns etc. ESG factors do appear to be
         | getting "priced in" as well
        
       | riku_iki wrote:
       | > in the company's disclosures including buying and selling from
       | companies controlled by their directors
       | 
       | How did you find that companies are controlled by their
       | directors?
        
         | kbennatti wrote:
         | It's disclosed in their filings e.g. "Among the vendors were a
         | director of the Company and an entity controlled by such
         | director"
         | 
         | At lot of egregious things are disclosed on page 101 of a
         | filing but they get missed because these filings are so long
         | and deathly boring
        
       ___________________________________________________________________
       (page generated 2021-07-20 23:01 UTC)