Transcripts

This Week in Enterprise Tech Episode 549 Transcript

Please be advised this transcript is AI-generated and may not be word for word. Time codes refer to the approximate times in the ad-supported version of the show.

Brian Chee (00:00:00):
This week at Enterprise Tech, we talk about vulnerabilities in power meters. And we talked to Lenley Hensel from Aerospike, talking about real timed decision data systems. Quiet on the set,

TWiT Intro (00:00:17):
Podcasts you love from people you trust. This is twit.

Lou Maresca (00:00:30):
This is twy this week, enterprise Tech episode 5 49, recorded June 23rd, 2023, aerospike's Realtime data platform. This episode of this week at Enterprise Tech is brought to you by Melissa. More than 10,000 clients worldwide rely on Melissa for full spectrum data quality and ID verification software. Make sure your customer contact data is up to date. Get started today with 1000 records, clean for free at melissa.com/schwittt and by Miro. Miro is your team's online workspace to connect, collaborate and create together. Tap into a way to map processes, systems, and plans with the whole team. Get your first three boards for free to start creating your best work yet. Miro.Com/Podcast.

(00:01:24):
Welcome to twy this week in enterprise tech, the show that is dedicated to you, the enterprise professional, the IT pro, and that geek who just wants to know how this wells connected. I'm your host, Louis Burka, your guide to the big world of the enterprise. I can't guide you by myself. I need to bring in the professionals and the experts. Don't have your own principal analyst at I'm d He's the man who has the pulse of the enterprise and he's always, always, always busy. Mr. Curtis Franklin, welcome back to the show, Kurt, to what's keeping you busy this week?

Curt Franklin (00:01:51):
Oh, there's too much keeping me busy. I'm trying to get ready to head out, doing a little bit of vacationing over the next week or two. So obviously as all of us know, that means that this week has been slammed. Lots of stuff going on. Published something new on omnia.com for our subscribers. Also had something new on dark reading.com and had to write a couple of things that will be up on dark reading the over the next week or two. So plenty to keep me busy and lots to look forward to. I get back to a busy, busy remainder of July and August

Lou Maresca (00:02:30):
For being here, especially during your busy season. Well, we also, you know, lots to look forward to. I was look forward to having Mr. Curse and, and Mr. Brian Chebe on the show. Brian, welcome back to the show. What's keeping you busy this week?

Brian Chee (00:02:41):
Packing <laugh>. I'm getting ready to actually go on a cruise in Alaska. So big change of temperature, but ought to be fun that, and I'm trying to go and revive a whole bunch of surveillance cameras. Seems people made some changes and it broke so I can fix things. A lot of fun though,

Lou Maresca (00:03:07):
I hear you. Well, speaking of busy, there has been a lot going on in the enterprise this week, so we should definitely get started with that. USB drives are spreading spyware again, cuz we know, you know, they, they're kind of happening a lot and as China's Mustang Panda, a p t is actually going global there, so we'll talk about that. Plus our guest today, Lenley Henley, he's chief product officer of Aerospike. He needs to talk about real time data platforms with massive parallelism and performance surprisingly small server footprints. So we'll definitely talk about that, lots coming up. But before we do, since there's a ton of stuff happening in the enterprise, we should jump into this week's news blips. Amazon has announced a hundred million dollar investment in a new program that's actually focused at aws generative AI in their innovation center, aiming to head and stay ahead the competitive AI landscape.

(00:03:56):
According to this test, TechCrunch article, the mission is to expedite enterprise innovation with generative ai. Now, the initiative brings together AWS affiliated data scientists, strategists, engineers and solution architects with clients and partners and names to actually encourage customers to dream big, dream up, use cases that can bring maximum value to their business. Leveraging that generative AI technology to the funding will fuel people technology and processes around generative AI, assisting AWS clients to ideate, design, and roll out new generative AI products and services. Now, the center will help customers build a roadmap around generative ai, focusing on use cases, aligned with business value, developing proof of concept solutions, and of course, scale as well. Now the center will kickstart its activities with clients who have previously contacted aws showing an interest in this technology. It mainly aims to support sectors like financial services, healthcare and life sciences, media and entertainment, and much more.

(00:04:55):
Now it's raced in the AI arena. Not long ago, a w s launched a program called AI Startups, an unveiled bedrock, a platform for generative AI powered platforms and apps. But would this be enough to beat the competitive and competition from Google Cloud and Microsoft and the AI market expected to reach nearly 110 billion by 2030. A large enough for multiple players. Question is right. AI focused investment is growing in Salesforce and Workday also committed to milestones and millions to AI startups. Amazon's entry adds to the escalating arms race among tech giants to claim AI dominance. Is Amazon too late or will there be new initiatives to turn the tides?

Curt Franklin (00:05:39):
So it's an open secret in the cybersecurity industry that most customers are overwhelmed by the number of security tools that are available and the number of security tools they've already deployed. There's too much complexity, too many signals of compromise, and too many options for how to deal with the all of that may be why a survey published this week by OpenText shows that the vast majority, that's 86% of small to mid-sized business customers using managed security service providers are aiming to reduce their current portfolio of security tools. In an article at Dark Reading, Jeff Bibby, senior Vice President at OpenTech Cybersecurity is quoted as saying, consolidation is twofold. Businesses are going to service providers because they don't have the staffing or financial resources to purchase and manage CER multiple tools. And from the service provider perspective, fewer tools and fewer vendors mean simplified billing and greater ease of doing business end quote.

(00:06:41):
Now this consolidation isn't something that's brand new. By mid 2022 when 83% of companies predicted that downturn this year, three quarters of companies plan to reduce the number of security vendors they used. That was up from 29% in 2020. That's something that's similar to what Gartner has reported seeing in their reports. Most midsize enterprises, those are firms with 50 to a million to a billion dollars in revenue. Up to 2,500 employees are looking to reduce the number of vendors, but rather than doing it because they're focusing on consolidation, their companies are looking to optimize their security operations. Now, according to Gartner, two thirds of these mid-size enterprises are pursuing security vendor reduction strategies while the remaining third have delayed consolidation, but will likely simplify their security within the next three years. And while pretty much every company wants to simplify its processes and save money, those savings can be tough to come by because what you wind up having to do is learn a new product, a new solution, and people have to get trained on it. So companies have to modify their processes and modification, as we know all too well takes time and well, lots of money. Sometimes it seems small to mid-sized companies just can't win.

Brian Chee (00:08:13):
So I'm gonna summarize this dark reading article and reality is I'm actually gonna talk a little bit about one of the dirty little secrets of building automation. So this article happens to be about the Schneider Electric ion system and is complaining about login credentials being sent in clear text. Now I need to point out it is not the only smart meter that does this. When I was doing a lot of work with the GE meters, those are doing the exact same thing. In reality, the default is no password at all. What bothers me a bit is that quite a few implementations of the I EEE 1588 slash Modbus T C P I P gateways aren't encrypted, even though the protocol for many manufacturers have that ability, which is why so many enterprise smart meter installations are on isolated networks. Well, the good news is that most ZigBee and LO implementations, which are superseding a lot of the i e E 1588 installations start off using encrypted communications paths for harvesting, say like harvesting data off residential electric meters.

(00:09:26):
Snyder does go on to say that this protocol is nearly 30 years old and it was a different world. Kinder, gentler renewable implementation has emerged that is encrypted. Now what we need to keep in mind is that IOT world isn't new and much of the IOT communication systems were developed in a kinder, gentler world. Heck, when they, we were working on the original wireless mesh system that became ZigBee. It was a DARPA project, the base protocol wasn't encrypted. Instead we used the SSH encrypted client to move data around. Since then, many wireless iot systems like Laura have implemented encrypted data pass as a default implementation. So anyway, I strongly recommend reading the Stark reading article. If you are at all involved with smart meters and building or campus management systems,

Lou Maresca (00:10:22):
There has some big been some exciting news in the world of space and telecommunications. Over the past 18 months, we've seen some outstanding achievements from James Webb space telescopes launch to starling's use in Ukraine. But according to this fiber system article, a quieter event, may it be the biggest game changer yet. In December, 2022, NASA's T-Bird Mission successfully demonstrated a groundbreaking a hundred gigabits per second data link for a small CubeSat size satellite. Unlike Starling's two point OHS inter STA inner satellite lasers, NASA system can actually penetrate the atmosphere clouds and bad weather at a fraction of the cost with speed equal to terrestrial fiber connections. And this speed is likely to increase as the technology matures. What does that mean for us? Well, laser communications are directional, meaning they don't interface or interfere with other transmissions requiring less complicated legislation, have lower costs, less carbon emissions, and provide enhanced security.

(00:11:22):
Now we can expect improved positioning systems, better uptime and reliability with satellite ground links. Now consider the cost implications. A global constellation of these satellites would maintain terrestrial fiber speeds, reducing cost significantly. Imagine a future or high speed wifi on a transatlantic flight or a SWIFT internet access and disaster zones is actually a reality. Now the breakthrough does raise some questions about regulation and protection for misuse, but if t-bird success can be reliably replicated, we may be on the cusp of communications revolution. So move over Dr. Lee Evil. It's not just about sharks with laser beams anymore, but satellites well folks that does for the blips. Next up the bites, but before we get to the bites, we do have to take a really great sponsor of this week enterprise tech. And that's Melissa, the data quality experts e-commerce losses to online payment fraud. Were estimated at 41 billion globally in 2022.

(00:12:21):
Melissa ID helps organizations tackle identity fraud with an end-to-end process to safely track customer data. The biometrics verification staff features an algorithm that recognizes a matchup between the user's selfie and their ID image using an advanced algorithm. That process analyzes over 60 facial features, including changes of facial hair, makeup, age hairstyle, skin imperfections, and head pose. This cutting edge technology offers businesses an additional level of security and insurance that their customers are always genuine. Now, with this process, businesses can ensure that their customers are whom they claim to be enhancing trust and credibility. The financial industry and e-commerce retailers are already using AI powered fraud detection systems to safeguard their customers from fraud. For example, banks and credit card companies are using advanced machine learning algorithms to meticulously analyze customer data, including their purchase history and location to identify and potentially any suspicious transactions that are out there.

(00:13:26):
Plus retail retailers are actually working at this as well. They're utilizing powerful AI driven fraud detection systems to effectively prevent fraudulent orders and chargebacks, reduce risk, ensure compliance, and keep your customers happy. With Melissa's ID tools, you get the following with the proof of life feature. It does a face scan and checks for eye movement to establish the person is real. The document verification feature captures ID and document data using machine readable zones and optical character recognition document scanning to actually extract crucial information from passports, utility bills, driver's licenses, and much more. Biometrics uses smart facial recognition and facial comparison algorithms to recognize a match between a selfie and an ID image. Since 1985, Melissa has specialized in global intelligence solutions, plus Melissa continually undergoes independent security audits and is SOC two, HIPAA and GDPR compliance. So you know your data is in the best hands.

(00:14:29):
Make sure your customer contact data is up to date. Get started today with 1000 records cleaned for free at melissa.com/twit. That's melissa.com/twit. And we thank Melissa for their support of this week, an enterprise tech Well. Folks, it's time for the bites. Researchers at Checkpoint Research have uncovered rather devious piece of espionage malware that propagates itself via infected u sb drives. I'm gonna summarize this dark reading article now. The the malware came to the forefront during an incident at a European healthcare institution underscoring the sophistication and stealth of today's cyber threats. Now we are no strangers to the dangers of u SB drives found on the floor, whether it be black hat or airports. However, the elusive black back door malware named Wisp Rider is believed to be in the handiwork of the Chinese state sponsored advanced threat, a p t group known as tomorrow dragon, but you might know is them as the Mustang Panda as well.

(00:15:29):
The WIS rider threat was discovered after an employee at the healthcare institution participated in a conference in Asia. He unknowingly came home with an affected U S B drive. Now were spread throughout the hospital's computer systems after using it as, as at his institution. A Mustang Panda once primarily focused on Southeast Asia, seems to be expanding its nefarious activities worldwide here. This was confirmed when the group was found to be behind a cyber triage campaign against the Russian military, despite China's tacit support for Russia. Let's get into the specifics of W Ws rider, which was involved significantly since AVAs first identified it in the last year. The malware now has the ability to spread through U USB drives using a launcher named Hopper Tick. It cleverly includes a bypass for Smad AV and and antivirus solution popular in Southeast Asia and performs a d l side loading using components of a well-known security software and even components of gaming giants like electronic arts and riot games. Checkpoint has alerted these companies about the malware's activity. How serious is the threat saying that most organizations have, you know, even asked that the USB storage not be used to ensure data loss protection scenarios. That is the first question that I wanna throw to my cohost here, because the fact is usb isn't that an old technology. What do you think Curtiss?

Curt Franklin (00:16:53):
Well, USB is a technology that is not as common as it once was, is a way of moving files around, but it's still quite common. And we also have to realize that things are not uniform across the world. We tend to think of doing things here in North America and probably in Western Europe that might not be as common in other parts of the world in a lot of those places. A good USB stick is still a great way to move files around it. And I'll be honest, I know a lot of people here in the states that use USB sticks when they're moving things like large photo or video files, things that are just cumbersome to, to stick somewhere online and have someone else download. So while I would say that it's declining in use, U S B drives are still out there in huge, huge numbers.

Brian Chee (00:17:59):
Now,

Lou Maresca (00:17:59):
Question for you, Bert, because what can businesses do here to protect themselves from such an a kind of, I guess you'd say, legacy threat?

Brian Chee (00:18:07):
Well, let's put it this way. One of the key features for almost every virtual desktop or thin desktop or whatever types of system out there where you have corporate control is being able to turn on and off u SB support, u SB storage support. So that's kind of, you know, broad sweeping. I, I am gonna bring up a little bit of history here. I was at DEF Con and one of the three letter agencies here in America decided to try and educate the users by sprinkling random U S B drives around the conference. And when people, you tried to stick it into their machines, it would pop up a message saying, you've, you've been pod well, not really, but this is what could have happened. And so it was more of an education thing. It's gotten to the point where a lot of large organizations are now saying s b drives are o are pretty much disposable.

(00:19:09):
Make sure you go and open a fresh one. You know, don't go picking it out of a big basket anymore. And I I'd say they actually have taken the place of what we used to use Floppies for, and the price has dropped to the point where it's almost as cheap as floppies. That is not to say that they haven't opened a new pathway. And the, in this case, the article actually talks about how even isolated networks, so like for instance, power control and power plants, high security facilities and so forth are air gaped networks. And this is one way for them to go across that air gap. And I'm going to bring up one last thing, and that is, keep in mind, a lot of these nation states are into collecting grains of sand. You collect enough grains of sand and pretty soon you have your own beach. So don't let 'em collect the grains of sand. Start with be careful about your USBs. I actually don't let my, my USBs, my work USBs, I don't let them outta my site anymore. Because there's just so many things happening and I now tend to buy really, really cheap, like eight gig or 16 gig SB drives in bulk that are still sealed. And I throw 'em away after I've sent it out to someone that I don't know and trust implicitly,

Lou Maresca (00:20:39):
Right? It's funny because they used to give these out a bunch at conferences, obviously with a bunch of decks on 'em or whatnot. Now it's all cloud-based. Actually, this brings to my next question. A lot of organizations are trying to implement zero trust. They're trying to ensure data loss protection. And so they're asking users to store things on cloud storage, local storage, that kind of thing to ensure that they can audit this type of access. And so when things are shared, they know what type of data is being, you know, ingre and egress. Curtis go back to you. If, if a USB drive has to be used, does it really make sense? Is it it makes sense in a zero trust environment? Is there really reason for it?

Curt Franklin (00:21:23):
USB sticks don't make sense in any sort of highly secure environment. There are just too many issues with them and they're too difficult to control. Now, having said that, are there some somewhat expensive ways to have secure Flo, you know, USB drives? Sure, but do most companies use those? No. Do any statistically, do any individuals use them? No. so, you know, if you are concerned with security, you find another way to deal with it. You know, as I said, the place where things get complicated is when we're dealing with very large files. I mean, if you're dealing with, with spreadsheets, with texts with PowerPoint decks, things of that nature, then you can stick 'em in the cloud, do a secure file transfer, all kinds of things come, come into play. The problem exists when you have to deal with video files, big video files for editing large graphic files, large audio files, things of that nature. And that's when you have to, as a a unit, as a group, figure out some way to securely deal with those occasional large file transfers. Because if you don't, I guarantee that your users will find a way around your security and that takes you into territory you don't want to visit.

Lou Maresca (00:23:04):
Right? Right. Now, Thibert, you're, you're familiar with a lot of network technology. I can tell you from a fact, I've saw an organization just recently has a bunch of network appliances, semi older, I'd say maybe five years old, that still only update via usb. What, what's the way for protect there? If somebody has a malicious file on a, on a, on one of those file on one of those devices? Yeah,

Brian Chee (00:23:27):
It actually gets worse. It's not just network devices, it's servers in order, almost all the servers I've worked with. There's no easy way of doing a bios upgrade or an e you know, any kind of firmware update on a server without getting a s b involved. So for that, what we did with the three letter agencies is we would get a trusted source of USBs, typically ones that are still sealed, we'd open it, we'd make sure when we downloaded the file that we needed, we would run a hash on it. So regardless, you should be running a hash. Now, I'm not to say that this is perfect. There has been some cases where corporations got hacked back at the distributions or back at their download server. And I don't have a solution for that, but at least if it's coming straight from the company and the hash matches and you created that u sb for the upgrade, you've reduced most of the dangers, you know, inside insider threats are, you know, all bets off.

(00:24:45):
But my other thing that I used to like doing is I used to like use, I still use optical still do write once. I'll, I'll write to a CD or a D V D that can only be written that one time, and then I'll run the finish process. So it cannot be added not perfect, but that's how I will transfer larger files in a more trusted environment, and I will make sure I run a hash on that and use a different method to send that hash file or hash signature to the destination.

Lou Maresca (00:25:28):
Now, one thing I find pretty interesting here, one more question, Kurt, because I, you know, I find it interesting that, you know, obviously this type of threat seems to actually be targeting organizations that might not have user education around this or maybe even a, have the resources or the services to protect themselves from this type of threat. So this is like the example given in, given in the in the article itself was a hospital in this case. So it seems like this type of threat is targeting very specific organizations. What do you think, Kurt?

Curt Franklin (00:26:01):
Well, that's classically what U S B carried threats have done. I mean, let, let's all remember the infamous case of Stuxnet that ended up hitting the Iranian centrifuges because of malware carried on a USB stick that got inserted in, in a lab in Iran. The people most vulnerable to this kind of attack are those with a lot of technical training in the wrong areas, because that often presents a false sense of confidence about what they can get away with. I, I hate to say it, but many physicians fall into that category. They're highly educated, very well trained in their specialty, does not necessarily mean that they're superb IT people. Many categories of engineers fall into the same thing and, and you can go on and on. But the basic thing is that this is a category of employee, the highly trained, often highly intelligent and highly motivated individual that presents a special vulnerability for it to deal with. And unfortunately, they are frequently the most difficult cats to herd in the right direction.

Lou Maresca (00:27:46):
Oh, folks, that does it for the bites. Next up we have our guests, but before we get to our guests, we do have to thank another great sponsor of this weekend at Enterprise Tech, and that's Miro. Let me ask you something. Are you still dealing with the annoyance of constantly switching between tabs and tools and losing important information and ideas along the way? Well, have you heard of miro? It's the cool visual platform that combines all your work in one place, whether you're working from home or even a hybrid workspace, it's a super powerful collaboration tool that that's everyone on your team share their thoughts and make something great together. Shortened time to launch so your customers get what they need faster. With Miro, you need only one tool. See your vision come to life planning, researching, brainstorming, designing and feedback cycles. It can all live on a Miro board across teams.

(00:28:36):
And faster input means faster outcomes. In fact, Miro users report the tool increasing project delivery speed by 29%. Miro also allows you to effortlessly view and share the bigger picture. Everyone has access to the same information and voice. Your team is always engaged, invested, satisfied, and most importantly, happy. Additionally, using miros templates like the swim lane diagram can help you avoid confusions regarding responsibilities, processes, roles, and even timelines. Strategic planning becomes easier when it's visual and accessible. Tap into a way to map processes, systems, and plans with the whole team so that not only view it, I have a chance to give feedback as well. If you're feeling meaning fatigue, I know I am Miro users reports saving up to 80 hours per user per year just from streamlining conversations and feedback ready to be part of the more than 1 million users who join Miro every month. Get your first three boards for free to start working better together. Miro.Com/Podcast. That's M I R o.com/podcast. And we thank Miro for their support of this week in enterprise tech. Oh folks, it's my favorite part of the show where have to get to bring it to guest to drop some knowledge on the TWI riot. Today we have Lenley Henley, he's Chief Product officer of Aerospike. Welcome to the show. Hensley. Lenley.

Lenley Hensarling (00:30:05):
Hey there, glad to be here with you.

Lou Maresca (00:30:08):
Yeah, thanks for being here. So we, you know, before we get into the show, our audience is a, a large spectrum of experiences. They, whether they're entry level all the way up to CTOs, CEOs and a lot of them like to hear people's basically origin stories. Can you take us through a journey through tech and what brought Aerospike?

Lenley Hensarling (00:30:26):
Yeah, for sure. So, you know, in terms of my, my own origin story, you know, I, I'm, I'm a retread. They didn't have computer science departments. Back when I started, I started in economics and we were one of the groups that, you know, made models on computers that filled, filled floors of buildings and had about you know, a quarter of the compute that you know, you've gotten your phone if that, right? And and I came back and retooled at one point and, and got into financial software and then got over onto the system software side where I'm at now at Aerospike. And I've sort of played the game from all sides a lot of time, you know, typing code a lot of time managing large engineering organizations and a lot of time getting on the business side, right? Because, you know, as, as we progress that inevitably happens. And now I'm chief product officer and, you know, drive our product direction at Aerospike.

Lou Maresca (00:31:29):
Fascinating. So I think one thing that I, you know, we definitely should get into it because this is a very interesting topic. Parallel computing and data processing is essentially sometimes the holy grail for some specific organizations. It's very challenging to get there, get it right and scale well. Can you maybe share some examples of maybe organizations that might benefit from combining those two things?

Lenley Hensarling (00:31:52):
So, so I think that, you know, I'll use an example of a customer we have who spoke at our summit conference yesterday. <Laugh> guy Dasa at Dataminr. And, and Dataminr looks at over 50,000 data feeds, you know, simul simultaneously ingesting huge amounts of data and then developing alerts about things that are happening in the world, you know, and what we see now in enterprises is both people using internal data and what, what I call global contextual data, right? And many, many data sources, right? So when you have that many data sources, just the ingestion becomes a problem and you want to pull it all into the same place. You want to capture it effectively, meaning, you know, not drop any on the floor, you know, you capture all the transactions coming in, all the data, and you need to put it to use as soon as you can, right?

(00:32:46):
And, and that means that you have to put a lot of processing and a lot of data movement in parallel at the same time. So we're distributed database truly distributed, you know, we, we both scale up, scale out and that means you can put a lot of different processors going against things, but we also write our code so that we're, you know, able to, to handle millions of transactions per second. We have people doing 12 billion transactions a day. And the only way to do that, you know, in a cost efficient way is with commodity hardware, commodity instance types, you know, in the cloud and put them at work in parallel acting as one, you know, concerted group handling the workload, right? And that's really what clustering algorithms are about. And, and we really focus on those things and being efficient.

Lou Maresca (00:33:49):
So that gets me to my next question cuz obviously you're using some interesting algorithms. Can you maybe share just a little bit about some of the technology you're using in order to to, to do this and, and even at such a small scale of resources requirements?

Lenley Hensarling (00:34:03):
Yeah, so, so, so I'll, I'll tag this from two standpoints. One is that, and this is one of our key secret sauces, right? What we have is the ability to, for a given server project down the data workload over 36 terabytes and operate on it as if it were memory. So we use SSDs in a different way. We don't go through the file system, we don't go through the operating system, we write directly to the NVMe, you know, driver, and then we treat it like blocks of memory and are able to do that in a massively parallel way with multiple, you know, IO channels. And, and that allows us to use fewer servers to cover a given workload. I'll give you an example. We did a, a benchmark set up with AWS and Intel for a, a petabyte data set and did it with 20 servers.

Curt Franklin (00:35:08):
Well, I've got to ask, with with those 20 servers, are you essentially striping the data across them? Or are you doing something a bit more sophisticated using some sort of other architecture for the data?

Lenley Hensarling (00:35:21):
Yeah, so, so, so great question Kurt. So, so we're not really striping the data. We divided up into 4,096 partitions. We shard it out, you don't have to shard it. We do it based on a hashing algorithm. We also constantly adjust that. We have load balancing algorithms going on all the time in the background, and we're redistributing that data and that partitioning algorithms, it's a, it's a roster based scheme for clustering rather than a quorum based. A lot of people go like, how are you able to do this? And we have to then explain, oh wait, you're thinking it's a quo based model. It's not, but we're a roster based model and, and that allows us to give higher availability. But if we lose a node in, in, in a cluster, we can then have replication factors that allow us to go find all the partitions that have the same data and then rebalance and reform the cluster. And so you'll never even know what happened. We have people who, you know, are, are online and they, they may have, you know, 50, you know, a hundred nodes and they'll lose a few because of the vagaries of, of the cloud <laugh>, if you know what I mean about that. And then it just keeps on working and then they'll come back and replace those nodes later and, and just suffer zero downtime and really almost no performance impact.

Curt Franklin (00:36:51):
Now, I, I wanted to ask, because you talked when you were, were answering an earlier question about the number of transactions that some of your customers have in a given day. So do you position yourself as a pure O L T P engine? Or do does your architecture allow you to be accessible as and and useful on a commercial basis as an analytics engine as well?

Lenley Hensarling (00:37:19):
So, so, so I'll give you a nuanced answer to that, Kurt. So, so, you know, by and large we, we sort of position ourselves as an operational database, which really, you know, speaks to the O L T P side. So unlike a lot of NoSQL databases, we, we are strongly consistent. So we support linear reads, you know, and, and, and are able to be strongly consistent with, with transaction rates that, you know, remain up into the millions per second. And the other side of it though is that we are used quite often to drive machine learning because we'll ingest all this data and we're able to do that because we support mixed workloads really well, high write rates, but also high read rates and, and high mixed workload rates. So as stuff comes in and when you're, when you're talking about real time workloads, if you have to move the data around to do you, you know, some level of training or, or learning, then you lose time. And so being able to operate in place and then also write the features back into the same data store and have accessible for, for queries, if you will, you know, and, and that's used a lot in fraud situations, in identity management situations. And, and that tightening that loop is a big benefit.

Curt Franklin (00:38:51):
I know that there are some of the operational databases out there that get their performance, especially when it comes to doing things like ad hoc queries and other things sent to, against the data by, by sucking vast amounts into a truly horrifically expensive blob of memory so that you're doing, you're doing in-memory transactions as much as possible. Now, does your architecture let you provide the performance without having to write a, you know, a nine figure check for quantities of, of memory?

Lenley Hensarling (00:39:35):
Yeah, exactly. So, so, so a lot of solutions that sort of claim the same things we do, as you said, they're in memory solutions and DRAM, as you point out, has tax on it, <laugh>, you know, it's more expensive. So as I said, we're using N V M E S S D space to extend, so what we do is we put our indexing in memory, this is our hybrid memory architecture, and then all of the data on SSDs. So where, and, and we're able to get access to a piece of data in a single hop from the client, because the client also, you know, the, the application is a first class participant in the cluster architecture, it has the roster as well. Okay. And then the, the other thing is that with the, with going to N V M E, we are doing that access to data in single digit to even sub millisecond latency as if it were memory. And we can do that across data sizes from gigabytes up to petabytes because of this partitioning algorithm that we have and cause of the roster based solution that we have that the, that the client can understand where to find that piece of data, you know, in one hub.

Curt Franklin (00:40:59):
Well, I know that my colleague Brian has a set a whole set of questions he's waiting to, to ask, but I've got one more and I hate to say it, I'm, I'm, you know, pulling, putting on my analyst hat here a as you say, I am familiar with some of these other databases and many of them are tied to large enterprise applications specifically something like S hana. Yeah. So you are not tied directly to any specific enterprise application. Who would you see, or what technologies do you tend to see as your competition? I mean, who do you find yourself when you go to a customer or when a customer comes to you? Yeah. Who do they say they're comparing you to?

Lenley Hensarling (00:41:55):
So, so, so I'll say this, that a certain portion of our business is brownfield, you know, replacement of technology. And so people on Cassandra might have, you know, 2000, 2,500 nodes, right? It scales out marvelously is what I'll say. And they just go, you know, it becomes unmanageable and those nodes, as we were talking about, you know, to get the latency, they're using higher DRAM ratios and it becomes expensive. And so we'll replace something like that with, you know, 250 or 400 nodes. Ok. And, and that's something we see over and over again, both with Cassandra, with Couch. We had a customer Criteo in, in France that's a large EdTech customer that was using Couchbase. They were able to cut the number of nodes they had for those workloads down enough to close more than one data center that they had.

Brian Chee (00:42:59):
Well, wow, okay, so first off, there's a person in the chat room, he is not saying his name or her name apparently, they, they used to work for you back in 2001 in the Denver office as a Unix Pro platform manager for J D E and they're just doing a Hello

Lenley Hensarling (00:43:20):
JD Edwards. You bet, <laugh>.

Brian Chee (00:43:23):
So my question actually is, realtime platforms are spectacular and there, there's all kinds of things you can do with them, but they, they tend to create a decent amount of data. How are you going, you know, information overload in any real time system gets to be a real problem. I, I dealt a lot with the military and that was one of the big problems of a lot of the real time systems we used. How do you propose we start dealing with that as more and more systems start coming online?

Lenley Hensarling (00:44:00):
So, so, so we, we see an interesting pattern in what customers are doing. You know, these days we learn as much from our customers as as they learn from us, right? And so, so we call it an edge and core deployment model. And so the edge might be hundreds of terabytes and it's what's keeping data maybe up to a month. And then they'll simultaneously through our change data capture, be filtering that data back over to a cluster that might be multiple petabytes. And, and there that remains historical context data because sometimes, right, you, you don't know when you're gonna need certain access to certain data. So they'll go through and they'll say, Hmm, didn't find him here, now we're gonna go back to the, to the core, if you will, and they find it and then they can use it there or they can pull it back.

(00:44:58):
Okay? And, and the other thing I'll say is that we're kind of an interesting cuz that brings, brings up the caching question, right? To some extent. So we, we, we are not used as a cash as much as we are as a, a twinning of system that this is the real time thing. And, and you know, we, we, we are used in financial services sometimes for intraday systems of record, okay? So it's all real time, but then they'll filter it back over to a mainframe that keeps all that historical data and, and that allows them to offer different levels of service, right? We've all seen the thing where you deposit money and you want it to show up like not, not three days later, not a day later, not hours later, but right then, right? And they're able to do that type of thing. But then all the regulatory stuff, all the, you know, accounting stuff winds up being handed off somewhere else.

Brian Chee (00:45:59):
Well, I'll tell you what, your pr group was very nice to send us some of your references like PayPal, Wayfair or Adobe Tel and others. Instead of asking you to break faith with them, why don't we go and ask you more? You are obviously not our, our father's database. What kinds of things are your, is your system appropriate for? And how do you plan for this? You know, what, what's involved with putting together the bricks and mortar to use a realtime system like this?

Lenley Hensarling (00:46:38):
Yeah, so, so, so in terms of, of of generalized use cases, right there, there are a couple, you know, I mentioned that we are a transaction capture point. So you, you know, for realtime capture and making that available, right? So that other people in a firm can see it and take action on it. And we have changed data notification capabilities that can let people know this data has changed, we can push it out to a different application and everything. But setting up, you know it, it's less than about real time than a truly distributed system is a different game. And people have to think of it differently, right? People are trained with things like Postgres that, you know, we have a server and then we have a high availability backup, if you will, right? And when you say, here's my data, which, you know, which node is it on in ours, it might be moving around, you know, moving from one place to the next, right?

(00:47:42):
And so I think that's something that people have to, to understand and we manage all of that as, as opaquely as possible, is what I'll say that they don't have to to follow that. In terms of, of just real time use cases, I think that feature stores, right, we all talk about AI ML a lot now and, and features are generated, but the question is, you don't get value from AI ML until those features are deployed back to the edge and used for decisioning in real time and used for decisioning at scale is the other thing we see. And this notion of at scale, you know, I mentioned we do, you know, 12 billion transactions a day, but when you talk about using multiple features, you know, hundreds of features, you know, a terabyte of features to, to solve a decisioning problem, but you have 20 to 40 milliseconds to do that without creating friction in our interaction between the person and the system. You have to really do things quickly and that's where we excel.

Brian Chee (00:48:54):
Okay. So since we're talking about responding within, you know, let, let's ask with the question. I'd like you to polish up your crystal ball a little bit. Everybody and their uncles going crazy over AI and how it's going to affect different industries, and it occurs to me, one of the things that happens is being able to respond quickly and say the financial world where milliseconds or make a huge difference. Is this something you folks are going to look at? And is this something that you see as appropriate for the type of work you folks do?

Lenley Hensarling (00:49:36):
Yeah, it is. And I'll say we, we've been on a journey in some sense, right? We started out just being a key value store. We added documents. We, we just announced support for graph data models and graph processing. But the key, the key thing that we sustain across all these is to do it in a way that scales both, both in terms of throughput and in size and amounts of data applied, ok. And when you, when you talk about what's going on in AI ML right now with generative ai, it's great, right? We can all go play with chat G P T and it's kind of, it's kind of amazing, right? But then when you think about applying it in a business, it comes down to being able to apply the vectors that are generated in the learning systems that you now have to, and applying them quickly at the edge, which means that you have to be able to do, you know, proximity searches with hierarchical navi, navigable small worlds, they call it.

(00:50:39):
I love that actually, you know, but h and s w but it's a different search algorithm and it's gonna require the same types of applications of making use of parallelism in the hardware, being able to get access to datas, you know, those vectors are essentially multidimensional, you know, arrays that are flattened out. You have to be able to pull that back and find those elements and get through all those computations very quickly. So, you know, when you, when you ask something in chat, G p t, it's sort of typing at a given speed and it's cranking long, but when you're talking about doing a trade, it's gonna be something that has to be applied really quickly. And we think it's the same problem, just an extension of the work we've done with the typical features that are applied coming out of machine learning today.

Brian Chee (00:51:33):
All right, so your PR group sent me an interesting little tidbit and they're, they're talking about how you folks are proud of helping reduce people's carbon footprint and it doesn't feel like it's intuitive, but obviously there's a story there and why don't you share that?

Lenley Hensarling (00:51:58):
Yeah, so, so, so, so, you know, we, we talked about being able to use commodity hardware and use fewer servers, right? I said, you know, somebody might be using, you know, 2,500 servers and we would replace it with, you know, 200 or 300, right? And, and that's the savings and electricity and all of the different y you know materials that go into making a computer. And as we y you know, look at an AI driven future, right? It means that more and more data is going to be applied and more and more computation and being able to do that efficiently and effectively with a minimum use of hardware to accomplish a given task becomes ever more important. So we published a, a paper coming out of work we did with Amazon when we ported our solution to Graviton, right? To their arm processors because arm processors save energy. That's the big key. They're, they're faster, but the faster in being able to do more computations isn't the big thing. They just use way less energy and using fewer servers in combination with something like an arm chip that uses less power means that you don't need as much electricity

(00:53:21):
And it's significant. Think about that 2,500 servers, you know, to 300, something like that. You're gonna save save money too. Save money too. You, you know, it's like I worked, worked with Larry Ellison at Oracle and you know, he would say, is this bamboo stuff? Is it green? And, and sometimes I would say it is green in two ways, Larry.

Brian Chee (00:53:44):
Nice. Alright, so everybody's, my, all our viewers are gonna want to know what kinds of homework should they be doing before they give you folks or call in? How do they reach out to you to get more information about Aerospike?

Lenley Hensarling (00:54:03):
Well, well to, to get more information about Aerospike, you know, it's www.aerospike.com and we have a developer hub, so dev hub, dev hub. And the dev hub has forums where, which we monitor, where you can ask questions about, you know, I've got an application, I'm thinking about, you know, trying to speed it up, or I'm trying to use more data to get a higher fidelity result in a decisioning system. And, and they can see what's going on there and get, get answers there. And, and that's where I think they should, they should go there. And what was the first part of your question,

Brian Chee (00:54:42):
<Laugh>? It's just what, what, what kinds of things should people ask the themselves? What's the hallmark?

Lenley Hensarling (00:54:50):
So, yeah, so, so I I think it's like this, you know, a lot of this comes from data scientists and, and, and, and, and the overall system. So we see over again where there's this conflict between the people doing the data science and the people doing the data engineering, if you'll, ok. And they say, can we make the results better? You know, you mentioned that, you know, Wayfair is a customer, right? And they were able to get, you know, very real benefits to the cart size by applying not, you know, a few hundred megabytes, even gigabytes, not even terabytes, but they said if we could apply 30 terabytes each time and generating the add-ons recommended the add-on recommendations to a cart, then we could make a material difference rather than what the developers and the operations people were saying. And, and when they went to Aerospike, they said, you know what? We can do that. We can do that now and get a, a different result. So you really need to know what you're looking for. And, and the basic thing is that, you know, applying more data in a given, you know, time-bound SLA service level agreement yields a higher fidelity result from a higher fidelity model, and it translates into actual dollars for people. But if you don't clearly think that through, you know, you can a lot of money, you can do a lot of work and you may not get the financial results you expect.

Brian Chee (00:56:29):
Oh, okay. So I get to blame you guys for that extra little things. I've been that Wayfair throws in my basket saying, you really want this, don't you? <Laugh>?

Lenley Hensarling (00:56:40):
I I I I will say that, you know many of my friends, that's the way they characterize my work. Oh, you're the guy who does that <laugh>. There's more to that. Well,

Brian Chee (00:56:52):
I, I tell you, we are have all kinds of fun and I was really and truly hoping some of our viewers would have more questions other than hi <laugh>. But where do you see Aerospike going in the future? Are you willing to speculate on what other things it can be used for? Where is that golden fleece that you're looking for?

Lenley Hensarling (00:57:19):
Yeah, I, I, I think that, as I said, it's a continuation of the path we've been on. You know, we've been helping people do trades in ad tech bidding. We've been helping people capture trades in you know, stock brokerages to do models for risk analysis and things and, and apply more data to a given problem and do it more quickly, right? And as we look at the AI future, it's more of the same. I mean, one of the most interestings I've read was in the financial Times about the cost of generative AI and what a difference, like two more percentages, you know, go from five 5%, you know, for loans to 7% and all of a sudden the cost of money goes up and can anybody afford the farms of servers that are required to actually drive that, right? And I think that being able to do that more efficiently just matters.

(00:58:28):
So, so we think about trying to do this stuff that needs to be done as at, at low cost and efficient and within time-bound SLAs because getting the answer later makes a huge difference than getting the answer now, right? And the, based on the context that's now, I sometimes joke about our business that we're all in here, <laugh>, you know, Brian, you know Kurt, we've been trying to catch up to the present forever, right? It used to be you'd go look at your warehouse and know something from, you know, a month ago and now we're just getting closer and closer to being able to make decisions based upon data that's current and being able to make decisions in the moment as well. And I think that's what we're focused on and the algorithms change and it just needs more data and we're gonna be there with vector database support and proximity searches and, and things like that.

Brian Chee (00:59:34):
I'm gonna go and ask one last question and that's scaling. Right now you are hitting some fairly, it sounds like some fairly large organizations, but are we ever gonna see this type of decision making tool? Cuz that's what it is. It's a tool start getting smaller more towards the edge. You know, I, I can imagine things like, gee, it'd be great if some of these algorithms were saying point of sale for add-on stuff, or it could be in a hotel booking system saying, well, you've got this, you've booked this type of room, but for just a few dollars more you can go this way. It's all decision making tools and trying to do things quicker instead of letting the customer walk away. Do you see your industry doing that one of these days?

Lenley Hensarling (01:00:28):
So, so, so I think, I think we do that now, you know, through, through the, through the wonders of the internet and the increasing speeds of the internet where it's all happening maybe back up in the cloud or back up in a data center, but being interacted with at the edge, we are also seeing this that we work with, you know, Amazon and Google and others and, and people are making devices that will hold, I don't know, 36 terabytes, you know, on the order of that with a fair amount of compute in a ruggedized fashion that can be deployed in like an oil field and you know, in a battlefield, you know, wherever, right? And, and then capture data there. Make it available even when it's not connected and run those same decisioning algorithms, you know, like push the features to it and add new data to it in place and then still reach back over whatever satellite technology, you know, and get replenished and provide its data backup upstream too.

(01:01:39):
It's important to note that, you know, it's not just one cluster. The way data works these days is data's in motion all the time. It's being refreshed continuously. You know, we had a customer in a conversation one time tell me that, you know, the difference between the past and the present and management of data is that we used to think about using huge amounts of compute against huge amounts of data that that was old. Now we think about recognizing patterns as they come across and taking action based on them right then with the data in motion because data's old, you know, milliseconds afterwards.

Brian Chee (01:02:21):
Fabulous. You know, it's been fun and our audience agrees you've been a spectacular guest. Thank you so very much for your time and ideas. I think it's time to start winding up this show and say thank you and we'd like you to say goodbye. And where can people get more information? Do you want to have people reach out to you if they have more questions?

Lenley Hensarling (01:02:53):
You bet. So, you know, as I said, our website's www aerospike.com and you can reach me at l hen Starling lh E n s a, RL i n g@aerospike.com.

Brian Chee (01:03:11):
Thanks a lot. You know, we, Mr. Lou had to beat feet because he had prior commitments, so he wasn't able to stay through the whole show. But we had a great time talking to Lenley Hesling, chief Product Officer of Aerospike, and we'd love to hear your ideas. So I happen to be seabert spelled C H E E B E R T at twit tv. I'm also on Twitter. I do, I start bragging about all kinds of cool things that I'm tinkering with. I happen to be a dv n e TB advance net lab on Twitter. You're also welcome to throw email to twit twit tv and that hits all the hosts. We'd love to hear your ideas and so forth. Now I am gonna do one last thing just because this is coming up to the, you know, we're heading towards the 4th of July.

(01:04:05):
I had a public service announcement that was requested by a nonprofit company called Vient and is urging all eligible donors to schedule a blood donation this July, blood donations the week surrounding the 4th of July, typically dropped by 2000 or more and are among the lowest of the summer season. But the blood supply must continue to be replenished as emergency surgeries, treatments for blood disorders, and cancer therapies that use blood can't take a break for the holidays. Typo blood is the most transfused blood typo negative can be used to help any patient in an emergency and o positive can help any patient with a positive blood type. Platelet donations are also constantly needed for cancer patients. Open heart surgeries and transplants. Platelets must be used within a week of donation as a special thank you. Donors who come to give July one through seven will receive an exclusive viant cooler redeemable by email. Those who give between July 8th and 22 will receive a San Francisco Giant's blood donor t-shirt. Learn more and make an appointment to give at vient, spelled V I T A L A N t.org and so forth. Now, I will point out this is in the Bay Area, however, the need for blood is worldwide and especially during big holidays. Sadly, some people drink and drive and that causes a spike in the need for blood. So in your local community, help the people around you help yourself give blood. It's worth doing.

(01:05:44):
Mr. Curtis, you ha you've got all kinds of projects going on.

Curt Franklin (01:05:50):
I do indeed. Most of mine when I get back from vacation or going to revolve around getting ready for some travel that I have to do, I'm going to be going to a vendor conference in mid-July, and I'm getting ready for my presentation at the AMIA Analyst Summit, which is part of Black Hat. Gonna be in Las Vegas in August. I'll Beit Blackout. I'll also be at defcon. And as always, I'd love to see members of the TWI Riot, if you're gonna be at any of those in real time, real space. If you're not or even if you are, feel free to hit me up. I'm on Twitter at KG four gwa. I'm on Mastodon kg four gwa@mastodon.sdf.org. On LinkedIn, Curtis Franklin. Please follow me at any of those and send me, feel free to send me a direct message. Love to hear from you. Love to know what you are interested in hearing us talk about right here on this weekend, enterprise Tech.

Brian Chee (01:07:04):
Well, if you're watching the live show, why don't you also join the chat room at IRC dot twit tv. You know, you can also find the subscribe and download links at twit tv slash twi, something to share with your friends. But don't forget about Club Twit for ad free content and access to our members only Discord channel that starts at $7 a month. You can find out more at twit tv slash club twit. Well, thanks for watching and ne and for next time, just keep quiet.
 

All Transcripts posts