(00:00)
Kyle Mack: We try to have our products operate on this principle of like, you should be able to provide us sort of the minimum amount of information that you have about an entity, and then we should do the hard work to go from that data to a far more robust picture of that entity and set of information.
(00:26)
Ansel Parikh: Welcome to another episode of The Current, a bi-monthly podcast exploring the intersection of people, finance, and data. I'm Ansel Parikh, co-founder of Finch, the connectivity platform for the employment ecosystem. Today, I'm joined by Kyle Mack. Kyle Mack is the co-founder and CEO of Middesk, the leading business identity platform modernizing business verification, risk evaluation, and compliance.Kyle, welcome to the show.
Kyle: Thanks for having me.
Ansel: Yeah. Super excited for the conversation today, and I really appreciate you having taken some time. For anyone who's not familiar with Middesk, though, can you tell us a little bit more about the company? Like, what do you do and who do you do it for?
Kyle: Yeah, for sure. So to your point, we're an identity company, but we specifically focus on all the challenges around two companies working together. So we think about ourselves as really sort of business identity or business identity intelligence. What does that mean in practice? So it means we have about 600 fintech and financial service companies that use Middesk to power different parts of their customer journey, from application through onboarding, and then post-onboarding monitoring to help them identify potential risks in their customer base.
So to really make it tangible, if you open a startup and you open up a bank account at Mercury or Brex, or you end up using Ramp, they have a, a set of obligations to know who their customers are and to evaluate any potential risk of those entities at and after onboarding, and they leverage Middesk's products underneath the hood to help them verify those entities and make sure they're finding any potential risks as efficiently as possible.
(02:01)
Ansel: Yeah. Well, I'm seeing, you know, not by accident, some parallels to Checkr, where you spent a bunch of time, you know, back in, you know, like 2015. You were the first business hire, and so, um, I guess like your time there, you know, you're working on some of these pretty similar challenges. What was the moment you realized like, "Hey, you know, businesses have these similar type of issues"?
What was the kind of thing that got you convicted enough to go, "Hey, we should, we should build a company around this"?
Kyle: Yeah, for sure. So one of the things that we had to do around customer onboarding at Checkr was to make sure every customer we signed up was a legitimate business. So when you run somebody's background check, you know, let's say at the point of hiring, you're gonna collect Social Security numbers.
You have their address history, their date of birth, name. It's a lot of sensitive information, and then the results of the product are sensitive as well, like potential criminal history, driving record infractions. And so we had this requirement to make sure we knew who all of our customers were, that they were legitimate businesses, and that they had, I think the exact phrasing was a permissible, approved permissible purpose for being able to use the product.
(03:03)
In order to do that, we added a whole ops team, and that was their job. We called it the credentialing team. And so they did a bunch of, excuse me, people-powered processes to make sure our customers were legit. They were Googling them, taking screenshots of people's websites. Like it was, it was brutal.
Imagine you're a small business owner. You have a coffee shop or something. You hire your employee. They're starting today. You would go to Checkr to try to run a background check, and then you would sit in a queue for anywhere from three to five days before you could actually start running the reports.
And so, so yeah, that was the idea for Middesk. At the time we said, "Let's build the credentialing APIs." And then as we got into what we called credentialing, we sort of really came to learn that what we called credentialing within financial services is what people were doing for their, you know, your customer programs and sort of using that to mitigate different parts of compliance and risk.
And then the more we got into the actual business itself, the blueprint for how to run a background check on somebody is, you take their name and date of birth, then you try to, as accurately as you can, match that against hundreds of millions of criminal records, and only return the ones that, that belong to the person and we found that business data had a very similar problem.
(04:14)
It was like super fragmented government data. There was no central registry for it. It was a lot of underlying data work, entity resolution work, and we could do it as kinda like developer first product to be able to, uh, solve the problem.
Ansel: Yeah. No, I love the developer-first focus. I think, like, you know, there's a lot of different ways that you could, you could throw people at it. You could, I don't know, do a lot of manual processes. You could call up, uh, these government entities apparently. I've seen people do that. Yeah. Um, and so I guess, like, when you looked at kind of the problem space, like why was API kind of the, the main focus as a starting point to kinda build the infrastructure?
(00:04:52)
Kyle: Yeah. So our customers wanna use the products at the point of onboarding, which means they're trying to do usually real-time decisioning. So responses need to be done in sub-three seconds typically. Um, and so everything that we do, we're trying to do in as near real time as possible. You know, for what it's worth, there are certain use cases out there that don't have those same conditions.
So, you know, let's think about originating a $100 million commercial mortgage. Like, we don't need to make that decision in three seconds. There's gonna be people involved in the process. But when you're talking about high volume account openings, so things like bank accounts, card issuing, merchant acquiring for payments, small business lending, the time to make the decision really matters, which then means it needs to be integrated, and it needs to live in whatever workflows those customers have.
And we wanted to be part of whatever the existing kind of workflows were and be able to plug into those things as easily and efficiently as possible. So I think about our primary user as, you know, it's product managers, and it's engineers, and it's data science teams who are really trying to do like highly efficient, scaled onboarding and deliver the best possible customer experience.
Ansel: You have this like customer-facing, developer-facing interface that's just really clean. But I imagine on the back end, right, government agencies don't necessarily have that.
Kyle: The back end is chaos.
Ansel: Yeah. That's every infrastructure platform ever. But like how did you all think about like building backwards compatibility? Was there just like a, "Hey, we're gonna start building kind of like abstraction layers," or did you kinda take it one at a time? 'Cause again, like I can't imagine the level of fragmentation in the back end.
Kyle: Yeah, I mean, imagine behind the scenes we're managing our data pipelines against hundreds of individual government agencies ranging, and just in the US today, like federal, state, county, and city governments.
(00:06:45)
There are no APIs for any of these. So, like in the back end, we're like getting CDs sent to us in the mail. It's crazy. Like some jurisdictions, we were buying an external hard drive, and we mailed it to the agency, and then they like load all the data on it and send it back to us. And then we have this kind of big data platform, this entity resolution engine that runs and builds what we call identities, kind of like a cluster of all the data that belongs to the same business.
Those identities live on a graph, so we can start to traverse the relationships between companies. And then we've increasingly worked on this kind of intelligence layer that sits in the, in the middle. So behind the scenes is really where a lot of the complexity comes from. And then what we wanted to be able to do is, like to your point, you know, you used the term abstractions.
It's like we wanna be able to abstract away all of that complexity to something that somebody who is not an expert in financial crimes, regulatory compliance, credit underwriting, can then be able to leverage, which means it needs to be understandable. We need to use common terminology that people can sort of interpret themselves.
Increasingly, we wanna make it obviously easy for, you know, coding agents and things to be able to use our products, which requires just like high-quality documentation. So we just thought a lot about like, you know, what's the basic API design? We try to have our products operate on this principle of like you should be able to provide us sort of the minimum amount of information that you have about an entity, and then we should do the hard work to go from that data to like a far more robust picture of that entity and set of information.
And again, just try to make that as easy as possible for people.
Ansel: Yeah. No, that makes a lot of sense, and I, I can't imagine how complex some of that has gotten over the years, right, as you've started to really understand all of the edge cases. How do you think about building infrastructure that can handle edge cases? Was there maybe one you could tell us about that was like just particularly weird that you came across that gives a better sense of like, oh, this is what you have to do, uh, when you're building infrastructure that scales?
Kyle: Yeah. So I think the biggest learning for me was just taking the time to become an expert in like, digging through, to your point, the edge cases, like trying to find those edge cases.
(09:00)
Because then you can start to figure out, okay, well, like what logic would we need to have in place, or how would we reason through that to get ahead of it? So, I mean, there's so many crazy edge cases dealing with business data. Like, okay, so a simple, simple one would be, we deal all the time with typos or abbreviations or like DBAs.
So a company, I don't know, let's take… Gusto maybe could be an example, right? Like the actual legal entity name for Gusto is like Zen Business, but then they go by Gusto, and so you have to deal with kind of these, you know, the traversing from like legal business names to, to DBAs.
Or depending on the states where you've registered your entity, different states have different depths of information available. And one of the things our clients wanna be able to know is what is the actual operating location of a company? And so sometimes you can get their operating location through the corporate filings.
(09:55)
However, many companies use a third-party registered agent to be able to actually form their entity with the states. And so while you might have an address for the business through the corporate filings, the actual address that is provided isn't their actual address. It's like the address of their third-party registered agent, which could be their accountants, or it could be a law firm, or it could be a company like LegalZoom.
And so, you know, today, our client might have an obligation to verify that they know where a company operates. If the address that they have on file is actually the address of their law firm, and then that same address is listed on their corporate filings, but it's actually their law firm, what they run the risk of is verifying the address of their law firm, which doesn't actually address the thing they need to do, which is to verify the address of the actual operating location of the company.
And, you know, the entity, in theory, let's say they could be actually based in, I don't know, like Russia or something like this. And then they'll incorporate their company through this law firm. But those are the things like you really learn by getting into the weeds and like reviewing these edge cases, like doing tons of things by hand, um, because then you start to see all the areas where the product breaks, and then we sort of like over time try to encode the additional logic to be able to work through the exceptions.
(00:11:14)
Ansel: Yeah. No, I mean, talk about a moat. That's definitely one that you have to earn every single day, uh, little by little with every edge case. I think that's something that, again, over time makes the platform much more valuable because if you think about your customers having to do this themselves, they're just never gonna be as good as you, especially if you've got 10X, 20X, 100X the volume that they'd have.
So I think that kinda leads really well into the next evolution of this product, right? I think we've hinted at it a little bit, like intelligence. When you started with access, aggregating business data, making sure that it's just basic and usable, since it's across so many different systems. And so you started to go, "Okay, now what do we do with this data? How do we make it more actionable?"
Can you walk me through, like, what was the process of making that decision? Were there certain things that kind of gave you that conviction that, "Hey, we, we need to expand into this intelligence layer”?
(00:12:03)
Kyle: Yeah. So I guess building a data business, I've always thought about the sort of levels of value you can unlock, and there's three. There's sort of layer one, which is about dealing with just fragmented, messy data, and it– There's a certain level of value, which is just to do the legwork to be able to go access that information, and then structure it, and then document it, and make it available in a way that kind of just, like, removes overhead for people.
And then I think there's another layer of value, which is doing sort of fairly basic analysis on top of that information. It's a lot of, like, comparing and counting things. So maybe an example of that would be, you know, you look up a business through Middesk, and we can tell you that this business has filed for bankruptcy, and they filed for bankruptcy six months ago.
It's, like, fairly basic, just kind of comparing and counting. And then I think the, kind of the peak layer of value is really when you can get to the point of, like, synthesizing all the information that you have down into some sort of additional signal or perspective that you can really only get to once you can abstract away all the complexity and you have kinda like the whole breadth of data.
So we've always sort of thought through, like, the business strategy and evolutions. Like, we wanted to become sort of the global, over time, the global system of record and sort of source of truth for business data, and then use that to build this intelligence layer. The reason that we felt like, you know, now is the time to really start to invest in it...
I would say two things. One is, access to data is becoming easier. There was, depending on the type of data we're talking about, there was a point in time when data was kind of a new gold or, like, you know, the new oil or whatever. And I think now with like LLMs and different agentic products, like there's some– There's definitely a question of what type of data is really defensible in the long term. So on one hand, like when you think about value creation, like if you take a long-term view and you think that, hey, like access to data is just gonna become easier and cheaper and faster, then like you really like are forced to start to think of what the next layer of value creation is going to be.
(14:06)
So I'd say like some of it is just thinking about where the market's going, but then also just like the tooling that's becoming available. So when I talk about doing, you know, synthesis of all the underlying data, I mean, now being able to have agents and these tools that can reason, it actually allows you to generate this intelligence in a way more interesting and efficient way, and that's allowed us to just accelerate the roadmap.
So, you know, we can now have, focus on prompt engineering and give it all the context to be able to reason through a layer of intelligence that really is about like, you know, not “Does this business have a bankruptcy?”, but like, “Is this business risky in the context of the type of account that they're trying to open with us, given, you know, what the business does, how long they've been in business?” What we can see about them from their government filings, their online web presence?
(14:57)
And so that's allowed us to really push the edge of the type of intelligence that we can create and do it in a just like an efficient and really interesting way.
Ansel: Yeah. How does that change, like, your end user? Does that mean there's other groups, teams at your customers, like organizations that are now more involved now that you're kind of moving up maybe beyond a developer, and giving business value to other parts of the organization?
Kyle: Yeah, that's a good question. I mean, to my earlier point, historically, we really thought about the product managers and data science folks, and part of the reason for that is because our products historically have been fairly deterministic. And so you can build business rules off of it and automation off of it.
It is true that moving into this world of more reasoning-based intelligence, we start to move towards something that is less deterministic, and so we have started working more with, like, how can we help to support the workflows, especially of the operational teams?
(00:15:53)
So, you know, when somebody uses our product, say, and they can do kind of real-time decisioning off of the back of the product, that's great, but there are still many risk factors that people need to investigate. And so that's where now we start to incorporate more workflows within the product for operational users: kind of lightweight case management, the ability to move through actual case reviews, annotation, and decisioning in a more efficient way.
(00:16:20)
Ansel: Yeah. What's been like the feedback from customers as you start to go, "Hey, we're helping you make decisions faster?" I feel like there's probably a range of early adopters to people that are like, you know, "I'm terrified of anything that even says or sounds like AI." Curious how you've kinda navigated that.
Kyle: I think every industry for the end customer is on its own adoption curve. And it's not even just about, like, the company, but it's also about the use case within that business. And in the context of financial services, helping to make decisions around regulatory compliance, fraud, and underwriting, I would say we're still very early on the adoption curve.
(00:16:59)
I do not think we are at the point today where people are actually leveraging or relying on agents to do full straight-through decisioning. Maybe for certain types of level one case reviews—for a sanction, some sort of sanctions hit where the odds of a false positive are very high, being able to move through that.
But to really make, like, an informed decision, that's not where we're at. There's a lot of excitement because most people's managers, or manager's manager's manager, is asking them to have a perspective on what their strategy is going to be. But I think right now for our industry, we're still at the stage of a lot of exploration.
(00:17:35)
We're trying to spend a lot of time hands-on, getting feedback from people, being able to really, like, work through their workflows with them and in some cases, help them document their workflows, and then be able to translate that into the way our agents help them reason across their work.
But there is still a human in the loop in terms of making final decisions. So I think today the value prop is largely about accelerating the work of the operational teams, but not actually automating the work fully. But there's still a huge amount of value to be created in just doing that.
Ansel: Yeah. And sometimes I feel like, especially with compliance and highly regulated industries, human in the loop is more of a, “it's not a bug, it's a feature,” right?
Because you do need some accountability, and I think sometimes it's hard when you know, if you just have an agent doing things, and if the people running them don't have the ability to audit or understand them, it creates this really potentially dangerous outcome, at least in this medium-term when everything's still being developed and getting more sophisticated.
(00:18:35)
Kyle: People need to trust the result, right? And, like, to be able to trust it, I think that means you need to have a sense of confidence that it's reliable, like the thing's gonna do what it says it's going to do. I think to really understand, sort of the competency of the company you're working with.
Like, do they really understand the domain well enough to be able to encode my logic and do it correctly with the right tools, with the right sort of safeguards? But that stuff takes time, and so people need to see it, and they need to kind of understand the way these things are working.
(00:19:10)
Ansel: Yeah. Yeah. Well, it sounds like we're still early days in terms of really getting the AI adoption in, in certain parts of the industry to really take over, like, full workflows. Zooming out, I think we're seeing a broader pattern more generally across infrastructure space, where you kind of start with that access point and then start it to drive intelligence.
I think Plaid is a great example of this, where it started with bank data and now they're doing fraud analysis, which makes a ton of sense. And so I think it's something that we’d like to follow as well, in your and Plaid's footsteps, for employment data. Where do you think this pattern is headed? I know that you've done some kind of, like, separate deep dives on different types of data out there. What does intelligence look like in, say, five years from now?
Kyle: I think just in the future, generally, my view is, like, intelligence just becomes, like, hyper-contextual, and it's contextual to the business itself. So, like, are you a traditional bank? Are you a fintech company? Are you a payroll business?
(00:20:07)
In our world, I think a lot about risk. So it's like the risk of that business model is different based on the types of products you deliver. So it's contextual to business. It's contextual to the use case. So thinking about a big bank like, you know, JPMorgan.
You can open up bank accounts, you can do merchant acquiring. They have an embedded payroll product, and the risk associated with each of those types of accounts is different because of the way that money moves or the risk of that one account. And then it's contextual to who the end customer is.
So, like, evaluating Middesk, you know, the risk of Middesk versus the risk of Google versus the risk of, the coffee shop down the street. Like, all of those are completely different. So I think intelligence just becomes hyper, hyper-contextual, and the tools are only making it easier. We're really trying to find ways now to use the data platform that we have and sort of show the power of that data platform.
And just so, like, as an interesting example of some analysis and intelligence we were able to create that we would not have been able to, or it would've taken way longer, a year ago. Recently, this was about a month ago, the Department of Health and Human Services open sourced all the Medicaid payout data for roughly like a four to five-year window.
(00:21:20)
And, you know, we're talking about hundreds of millions of payments across, like, several million providers, and then those providers are also connected to several million more entities because they're parent companies, subsidiary entities. And so in about a day of work, we were able to go from, like, this open source data set to being able to query with natural language to be able to say, "Hey, you know, you're a fraud investigator specifically looking for Medicaid abuse."
Like, we go through this data set and try to find clusters that seem to be anomalous and high risk based on frequency of payments, size of payments, geographic, you know, proximity to other providers. And we, you know, in minutes could then get a full report of potentially high-risk clusters across the country and in Denver and in New York and all these different markets, and then be able to go and do these, like, deep dives and get to the point of relatively high confidence identification of fraudulent businesses that receive millions of dollars in payouts.
But to be able to get there, you could just sort of have this reasoning-based intelligence that understood the type of risk associated with Medicaid, the types of information about the providers, how a provider might try to defraud the system, and then find those patterns and do that full analysis in, you know, 30 minutes.
(00:22:45)
Like, it's just pretty, pretty wild. So I'm really excited about how to be able to look across massive data and find these patterns that in the past would've just taken a long time to be able to parse through.
Ansel: Yeah. I can imagine the other adjacent data sets, especially government ones that are pretty messy when you get them, but if you can then resolve the entities and unify them with this core identity platform, you could probably see some somewhat unsettling outcomes, but also some pretty interesting insights.
Kyle: Just like in this one example, like imagine... The federal government maintains this list of excluded providers that are not– they're barred from receiving any federal funding. In isolation, if you look at each individual entity, you wouldn't see the patterns. But if you look at the connections between businesses, like what we found is there were hundreds of millions of dollars going to companies that were not the blacklisted entity. But they were formed by the same person who owned the blacklisted entity.
They were operating from the same address as the blacklisted entity. And, and in some cases, the entity was formed on the same day as the blacklisted entity, and that non-blacklisted entity had received like tens of millions of dollars in Medicaid payouts. Is it to say that it's guaranteed fraud? No. But it's definitely risky.
And being able to pull those patterns out and being able to give the tools to our clients to be able to do that is like, it's really powerful.
(24:16)
Ansel: Yeah. No, I'm excited for more things that y'all do on that front.
I think there's just so much that you can help unlock that is, I think, again, it's all centered around: do you understand who this business is and who's formed it? And I think it's pretty wild that that took you a day. I can't imagine what y'all could do with a week, a year, with some more complex ones.
(00:24:37)
So I have one final question. Given everything we talked about today, right? Evolution of intelligence, all these data sets, things like that. How do you think these shifts will change what your job looks like in the next five years?
Kyle: Yeah. Man, five years, the world's changing so quickly right now, that's like hard to, hard to say.
I mean, I think increasingly, like if I look at the way we're running the company, the ability to organize data and insights about what's happening in the business, I hope my job becomes more about really clearly setting the direction and a lot less about like, extracting information.
And it becomes really focused on like, we can have all sorts of signals about how Middesk as a company is performing and like where we should go. And I think the job over time becomes about parsing through all the potential data points and just really getting good at selecting the ones that provide meaningful signal for the decisions we should make as a company. And I think that's kind of the way the CEO role will evolve over the coming years.
Ansel: Yeah. No, I'm excited for that. Even Jeremy and I have been definitely seeing a lot more of the extraction piece being a lot less friction-filled. So I'm excited for that because parsing through, you know, Slack messages and Notion docs is… it's just something that eventually we should be… at least as a starting point, it’s a good place for automation.
(00:25:55)
Kyle: Totally. And I don't think that'll take... I think this world you and I are talking about will happen like…
Ansel: Next week?
Kyle: It's now, but I think it will just continue to evolve and that will really be kind of how the role looks in the future.
Ansel: Yeah. Yeah. Well, thank you again for taking the time to come with us. It's been a pleasure having you.
Kyle: Thanks. Yeah, good to see you. Thanks again for having me.