Subscribe Now

Open-Source vs. Closed-Source LLMs: Navigating the Tradeoffs for Your Enterprise

In this installment of the Building Blocks of AI series, we explore key topics in AI, focusing on the differences between open-source and closed-source models. The video covers privacy and security concerns, particularly around sending sensitive data to third-party APIs, and addresses the risks of using open-source models, including potential cybersecurity issues like malicious code in repositories. It also discusses the importance of control and consistency in AI models, especially when used for automation.

Transcript :

Jacob Ortega: This is our newest installment of the building blocks of AI series. I am sitting down here with two experts in the industry, and we’ve got some pretty hot topics that we’re hearing a lot about, which we wanted to make sure we give some time on the stage to, so we can really flush them out. My name is Jacob Ortega with Lydonia. I’m a Solutions Engineer. I’m joined here by Vinny LaRocca and Dan Leszkowicz from Pienso. Do you mind introducing yourselves?

Dan Leszkowicz: Sure. I would also say “expert” is a loose term. Dan Leszkowicz here. I head up Sales Alliances for Pienso. Quick elevator pitch: Pienso is a no-code, low code AI platform. Specifically, it allows businesspeople to use, interact with, and fine-tune large language models.

Vinny LaRocca: I’m Vinny LaRocca, the Director of Engineering at Lydonia. I’ve been working with AI now for roughly 10 years, originally focusing on application of AI to guide industrial robotics and then moving more into computer science software land and LLMs, as transformers became more popular.

Jacob Ortega: This is a little bit of a departure from the machine world, but similar kind of themes here. It’s kind of hard to think of a world without AI and Gen AI, especially in 2024. I think something we hear a lot about is what are the differences, the benefits, and the overall usages of closed-source versus open-source models.

Dan Leszkowicz: Maybe definitions first. I think we probably have a spectrum of comfort. Open-source models are exactly what they sound like: they’re open. They’re more openly accessible. Specifically, the weights behind the model are known. We’re able to use and manipulate them. Closed models, on the other hand, are more proprietary models. They tend to be provided by organizations like Cohere, OpenAI, and a lot of the big names that dominate the space. The way those organizations provide the model is they pre-train it, then stand it up against an API, allowing users and customers to access that API. So, you send your data to it, and you get the information result, the answer, and response. You don’t necessarily get to see the guts of the model. Ask a question, get an answer.

Jacob Ortega: Some major differences, right? And I think there’s probably some trade-offs, maybe some benefits. Can we touch base on that real quick?

Dan Leszkowicz: Thetrade-offs of come in a couple different flavors. I think one that frequently just kind of bubbles to the top is the question of privacy and security. Because of the nature of what I just described – where the closed-source models are accessible via an API that exists somewhere -you send your data to it to use it, and you literally have to send your data outside of your secure environment. Whatever your IT posture is – on-prem, AWS, choose your flavor – you’re sending your data to some third-party API. Some organizations, honestly, that’s perfectly fine based on whatever data they’re using. I would say for a lot of the customers, at the enterprise scale that we work with, they’re very protective of their data. It is either comprised of their customer information, including potentially PII, or that there’s another half of our business that is focused on the government. Think Department of Defense, Intelligence community. That type of data is super sensitive and literally cannot be sent external to whatever bunker is being used on. So, going to a third-party API is a non-starter.

Jacob Ortega: Makes sense. Vinny, have we seen anything similar?

Vinny LaRocca: Yeah, I think people are hypersensitive to their data. They want to know where it’s going and what it’s being used for. And I think a key piece of this is to really understand what the goals of some of these companies are. So, if you think about OpenAI, their mission is to create AGI, artificial general intelligence, and although that is an exercise in refining the architecture and making the model more efficient to bring cost down and things like that, really it is an exercise in mining data and training as much data into the model as possible. I’m sure we’ll cover more of this, but a bunch of different problems come from that. But, if you’re talking to those models, you are donating your data to the model. And there are certain cases, like Dan said, where you might be fine with that. If I’m using it to help me write a blog post, I want it to refine grammar or something like that, great, that’s fine. I don’t really care that getting my data. But, if I’m taking customer data and I’m putting it in there, there’s some questions and some implications that arise there.

Dan Leszkowicz: Yeah, I’d love to just double-click on that. You raise a really good point. I think the concern around data leakage -once you’re interacting with the model, how much of that data is retained, and how much of it is used for further fine-tuning the model. Honestly, I think organizations have different perspectives on this. Some are either okay with a portion of that risk or are okay with using maybe a solution like OpenAI because it’s sort of under the Microsoft umbrella, and Microsoft is a vendor and all that sort of stuff. I think our perspective, what we recommend to customers, is if you are really concerned about your data, you want to retain full control of it over it. That means typically using an open-source model that that sits with your environment. And we’re only five minutes into the discussion, so please don’t take that as I just solved it – open is better. There’s more to discuss, or we wouldn’t have had this be 30 minutes. But, on the privacy and security side, that is one frequent consideration for our customers.

Vinny LaRocca: I think there’s two sides to the coin we’re talking about security, too. There’s data privacy, which people will coin as security, and then there is, actual cybersecurity of the model. I think those two things are different, and actually, there’s a dichotomy between the two. An open-source model is going to be a great option for retaining that data privacy and making sure that there’s no data leakage. From a cybersecurity perspective, there are additional things that you need to take into consideration if you’re going to use an open-source model.

Jacob Ortega: We maybe double click into some of those.

Vinny LaRocca: A good example is what’s going on in Hugging Face right now. People are making cloned repositories of popular models- so, like llama 3, just take that one as an example. If I’m a developer and I’m pinging a repository to pull that model in, there are similar repositories that have a very similar name. It’s really clear how a developer could kind of go wrong while building it, and that model comes in with malicious code in it. That malicious code embeds itself and then start to do things in your environment that it shouldn’t. So, there are tools now that will sort of fix that. Pienso fixes that problem. They’re an example of sort of improving that version of the open-source. But if you’re just taking an open-source off of GitHub repo or Hugging Face, that is a concern and something you have to be careful of.

Dan Leszkowicz: It works for developer playing around your basement. Doesn’t work at enterprise scale.

Jacob Ortega: Seems like we’re revolving around this theme of ownership and control, rights, configurability. In the realm of LLMs, what is that, and what does that really mean?

Dan Leszkowicz: I’ll take a shot, and I know you’ actually experienced this live, so I want your thoughts too. When we tend to talk about control, we’re talking about control over the model itself. So, let’s take an example, GPT-4, which has been out for, I don’t know, 16 months or something at this point, somewhere in that neighborhood. It would be totally reasonable to assume that GPT-4 is the same as GPT-4 has always been- same name. In reality, that model behind that API, it changes. So, every once in a while, every handful of weeks, every couple months, there are new checkpoints pushed. Basically, sort of like micro-updates to the model. Now it’s still GPT-4, sort of, right? You and I at home, just kind of asking ChatGPT for recipe or trivia questions or whatever, we don’t notice it because it’s still probably correct – or it’s correct enough for our purposes- or we can guide it. The problem comes from the fact that if GPT-4 is powering some sort of trend line on an executive dashboard – maybe it’s customer sentiment or what customers are calling about in a contact center, something like that- and everything’s humming along just fine, and then we come in on Monday morning and there’s a giant spike or a giant dip. Well, in our experience, that is frequently due to the fact that a checkpoint was pushed, and there was no means of knowing that the checkpoint was coming or catching that it came. So, someone’s getting an email about that dashboard, right? “Why did we have this big chain in performance?” And so, what we’ve done with some customers is ask them, as we kind of deploy Pienso in parallel, to use one of those APIs, right, and see how reliable it is over time. What they found is that lack of control- that shifting checkpoint- it basically made the inference results unreliable over a long period of time. I think it’s just that most organizations right now are really working their way through POC’s. They have not gotten to the production scale. And it’s only when you get to production scale, and kind of that longitudinal timeframe, that you start to realize things shift over time.

Vinny LaRocca: We had a customer who had built a large amount of automation off of ChatGPT. They came in one week, and they must have pushed a micro-update. I forget which model this was for, but it was one of the models that GPT was running on, and basically, the model started responding in a combination of English and Spanish. That’s an example of that kind of micro-update. It got corrected, but the problem with the automation is that kind of stuff starts happening, you don’t have any recourse. You may have automated this process, and it’s been running fine for a year. Well, now you’re down for a week. I can’t do any break-fix because I don’t control the model. Well, now what do you do? You’ve got to pull it back to a manual process, but nobody in your organization knows how to do it anymore. So, especially when we’re talking about automation, static is always better. The more static an environment can be, the more stable the automation will be in the long term. Sometimes we’ll get customers who, you know, I think it’s like worth explaining how that occurs. When you’re fine-tuning a model, it’s basically like this big, 3D scatter plot, and there’s all these words all over the place. A really simple example: if I had dog and fire hydrant, those words are going to be totally unrelated, right? They nothing do with each other. But if I prompt model or I’m doing fine-tuning, and I start to teach it that my dog is actually named fire hydrant, right, well those two things now look like this on the scatter lot. So, when you have these closed-source models that are just pushing micro-updates all the time, that scatter plot is changing. It’s not changing necessarily super significantly – not enough where, if I was just talking to it, I would necessarily notice a difference because it’s still the same model, still the same architecture, still understands, for the most part, the same data. But again, especially when you’re dealing with automation – and when I say automation, I’m just referring to something that I’m not necessarily interacting with day-to-day; it’s kind of just running or it’s doing something in the background- if they push one of those micro-updates, those prompts now are not producing the result they once were. I’ve got to go back in and re-engineer all of that, and that’s a problem. You just can’t do that at scale, right? So again, there’s a place for a model. If I’m just talking to the model and I’m chatting with it, and I’m not going to notice little fine details of differences in what it’s responding to me. Then, that’s fine. I don’t have to worry about it.

Dan Leszkowicz: I got to double down on one of the points that you said. Just kind of describing the problem, and part of it comes from you don’t necessarily know that it’s going to be a problem. I had a boss at one point, and one of his favorite sayings was, “problems aren’t problem; surprises are problems.” So, the real issue is you don’t know if the checkpoints coming, and you don’t have a way to kind of identify it in real time. So, back to my sort of like Switzerland perspective of open and closed, which are both great. There are ways to use closed-source models responsibly. One suggestion is implementing some sort of, what I’ll call, evaluation – making sure that the prompt that you built yesterday is still functioning the way it is today, rather than just blindly accepting the output of the automation that we’ve been describing.

Vinny LaRocca: You have to just understand what the purpose or the goal of OpenAI is. They’re not looking to make an automation tool; they’re not looking to create something that’s going to be static. They want to create AGI, so they’re going to continue to train more and more data into it.

Jacob Ortega: That’s a powerful point. I think that’s actually another good segue to another topic here. I know we’re covering a lot of topics here, folks, for you at home. If you have any questions please feel free to throw them in the chat, and we will get to them at the end of our session today. But I think, jumping back to that segue, right, we’re talking AGI. We’re talking kind of the next generation. AI, Gen AI, in the last 12 to 16 months, has really hit the ground running. But as you guys are looking at this, it’s not right now or in the next couple months; it’s the long-term perspective. And how do we start to build what we need to get there? So, it really does start to bring up this question, when we’re looking at closed-source and open-source, is there something that is, quote- unquote, more intelligent or more powerful or something that’s going to fit that mark long term?

Vinny LaRocca: Yeah, I want to I want to touch a couple things there. There was some benchmarking that was done recently, and what was determined is that the open-source models are always six months behind the closed-source. Now, that may or may not seem significant to certain people, but the reality is it’s really not all that significant, especially with some of the tradeoffs. The benefit of closed-source at this point really isn’t that it’s better; it’s just that it’s easy. It’s right there. I can log in; I can start talking to it. I don’t necessarily need to do a whole fine-tune open-source model if I’m just looking for a to correct grammar, have a conversation with it, and maybe make some LinkedIn posts for me. And not to trivialize the usage of that, but that’s something where the advantages of open-source aren’t quite significant enough to warrant that level of effort into actually standing it up. I think the other thing people need to keep in mind too is that there are very literally different types of transformer neural networks that are built. The advent of sort of large language models is really a subset of machine learning. When you started hearing about ChatGPT and all this, it was really transformer neural networks becoming powerful enough to start to do significant tasks. Whereas, in the past, we were using recursive type of network. And there are different types of transformers. So there’s encoder-based, there’s decoder-based, and there’s hybrid. There’s a whole bunch of different types. But if you were to just take like decoder-based model, basically, if you were to just simplify it, it essentially means that it’s very good at responding and putting together the response to prompts. Whereas, like an encoder-based model is going to be really good at understanding and analyzing large amounts of data and really getting a sense for what you’re asking and a bunch of contexts around what you’re asking as well. ChatGPT is a decoder-based model. So part of the issue arises with, like, if I wanted it to create a LinkedIn post, well, that’s a great use of a decoder model I don’t necessarily have to give it a bunch of context to have it do that. But one of the hallucination issues with ChatGPT is in part because it’s built on a framework that very literally does a better job responding than it does understanding. So it can misunderstand what you said, and then it can hallucinate and make the answer sound like it’s a genius. And you go, “oh, well, that sounds very intelligent, so I’m going to believe it,” and people just need to be aware that that’s happening.

Jacob Ortega: That absolutely does make sense. Dan, any other thoughts on the next generation?

Dan Leszkowicz: I think part of the kind of question behind the question is, like, what is smartest. I hesitate to use that term because I don’t want to amplify models, but that’s kind of what we’re asking. That changes with time. I really like the point about kind of the six-month lag. I think that’s valid. If you look back a year, closed-source were the frontier models, and they were way ahead. Probably still technically, the frontier models, but with Llama 3.1, two weeks ago, three weeks ago, that gap is drastically narrowing. Our perspective, going back years, was that that gap, at some point, will close and maybe leapfrog. It’ll depend on the model and the vendor and everything. So, we have made the bet on open-source models to a large degree anticipating that. The other thing that I’ll add to Vinny’s analysis of, like, different types of models, I would also add that ideally we’re not just looking at one model as, like this is the smartest model in the world and I’m going to do everything with it. If we can approach it more as sort of like best-to-breed solutions, there might be a model that is really good for processing invoices. That doesn’t need to be the model that you also use to analyze customer sentiment, in like, a contact center. It probably shouldn’t be. The ChatGPT type model, GPT-4, even Llama 3.1, they’re really good for those sort of, like, agentic experiences where you’re having almost a conversation, right? And you’ve got to drill deep. You want a thought partner. I really like using ChatGPT for some things like, writing marketing emails. I also use it to look up a braised short rib recipe. Between the hours of 9 to 5, though, I probably shouldn’t be caring about that. I care about something that understands my business data. And so, from a performance perspective, an accuracy perspective, you’ll sometimes go farther with a lot smaller model rather than the biggest, baddest one out there.

Vinny LaRocca: Yeah, you guys are solving for this in a really interesting way. You’re sort of future-proofing yourself around these open-source models because it’s not really open-source versus closed-source. You can make this one-to-one comparison and say, like, well, there’s a six-month lag. But when you look at what Pienso is doing, they’re taking a bunch of small, specialized models for individual tasks that you need within whatever thing you’re trying to do. So, they may have an embedder or an encoding model that’s going to understand some vector database, and that’s going to pass information to some other type of model that’s going to respond here. And then we have another model that might be producing some sort of confidence threshold, and it becomes this sort of symphony of these small, specialized models doing very specific tasks, which makes the comparison unfair. Because I’m not comparing one model to one model; I’m comparing 10 to one, and 10 is going to be better because, again, I’m more specialized in what I’m doing. I think you’re sort of future-proofing what the analysis is going to look like because if that the one-to-one open-source, closed-source does flip, and open-source starts to become better, well now we’re taking closed-source, which is already worse, and we’re comparing 10 to 1. And now it’s, like, really unfair.

Jacob Ortega: That makes sense. We might see that bridges as more organizations start to adopt more mature deployments and establishment of AI. So, I think only the future will really be able to tell, but I know something that’s on the top of everyone’s minds and might be one of the leading deciding factors of whether open-source or closed-source is really how to understand budgeting and payment. And really just the dollar question, I think, has yet to really be answered yet at true scale in production. So, how should people be thinking about that approach as they’re getting started?

Dan Leszkowicz: It is always going to depend. My typical caveat anytime we’re talking to a customer, and they have a budget question, is it depends. It depends on the use case, it depends on the type of data, how clean it is to begin with, what sort of things, how many times we call the model – that sort of stuff. The main difference from a cost perspective between open and closed, though, is you’re using a closed-source model, you are paying based on some measure of volume. If you’re doing it directly with the LLM vendor, it’s typically per token. If you’re using that model via some abstracted software layer or some other vendor, it’s maybe attracted per word, per document, or per page -something along those lines. So, that pricing model tends to look really appetizing at the get-go because we’re talking like fractions of a penny to do XYZ. The problem, to your point though, is most organizations are really only tackling those sorts of things in POCs. Maybe it’s like synthetic data, maybe it’s like one little business unit, and so that linear pricing – if whatever amount of AI we’re all doing today is going to hockey stick, and that linear pricing continues – AI becomes negative ROI, untenable, somewhere in that territory. So, our perspective is, let’s stay away from volume-based pricing because that makes it really tough to justify not only doing AI but experimenting with AI. Ideally, to get the best AI outcome, you’re using a ton of data, you’re having a bunch of people experiment, you’re trying different prompts and then you’re doing it all again. And that’s really tough to do with a closed-source model where you’re paying for all those things before you know what the outcome is going to be like. You don’t know necessarily whether that prompt is going to work, but you must experiment.

Vinny LaRocca: I think the analysis traditionally with open-source versus closed-source would sort of be, I have to pay more for the closed-source because it’s not my model at a payer transaction, or, on the other hand, I can download this model. If I have a decent enough computer, I can just run it locally. From that perspective, it’s like, “oh, well, this one’s a no-brainer,” but that thing requires very expensive and highly technical resources to be able to fine-tune, manipulate, and actually use properly. So, that’s like another sort of major problem of open-source that Pienso is solving for by just putting a low-code or really no-code manipulator on top of those models.

Dan Leszkowicz: It really is sort of like an LLM orchestration in a box. It provides that kind of turnkey solution, which is part of the when people are budgeting for using a closed-source model. It is super tempting, and it’s sort of the easy button to count the number of tokens in your average document, do the multiplication, and say, “this is what this a product is going to cost me.” That doesn’t account for all the stuff that Vinny just alluded to, though. If you are using just a closed-source API, you’ve got to have some amount of data scientist, ML engineers to interact with that model. You’ve got to come to it – mostly, if you’re doing any sort of like fine-tuning – you’ve got to come to it with some label data. That’s a cost intensive process where you’ve got to outsource it. And so, all those things are kind of project costs that don’t really get encapsulated in that per-token pricing. Our goal is to sort of offer a package that provides more of an end-to-end solution so that folks don’t have to itemize like every little spend required to do AI.

Vinny LaRocca: We went through this exercise, right. We were trying to build a model that was going to mine transcripts, and we were trying to figure out, prior to Pienso, how do we minimize this transcript? Because we can’t send this whole thing; it’s too much money. So, now you’re doing unnatural things to try to bring the cost down. Whereas, like with a an open-source model, it’s just like, “well I actually I want to give it as much as I possibly can to make sure that it arrives at the right answers. I want to give it all that context, and I’m not sort of starting with half a deck.”

Jacob Ortega: I think that’s actually a good segue. I want to make sure we leave some time to answer some of the questions from some of the users here. I think that’s actually a good segue because one of the questions being asked is, “what level of knowledge do I need to be able to select these correct models for the symphony of models? How do we apply that to the use case?” And so, I think that question goes along with what you’re talking about, where there’s this orchestration effort that needs to kind of go into this, right? And I think people are trying to understand what the easiest way is to do that.

Vinny LaRocca: I mean, I would say with Pienso, the real understanding that you need to have is the data itself that you’re manipulating. That’s really the analysis that they did when they started building the tool. You can come to me and say, “hey, I want you to build this model, and this is what it’s going to do,” and that’s great, but I don’t necessarily understand your world, right? I don’t understand the document, I don’t understand the data in it. So, I’ve got to have somebody over my shoulder really telling me what to do as I’m doing it. And, really it’s a similar analysis to most of these other adjacent low-code, no-code tools, like UiPath, where it’s like, “well, you understand the process, I don’t.” So, if I can teach you to do it, you’re going to get more bang for your buck by doing it that way. If you were going to do this as your going to grab something off a Hugging Face, yeah, you’re going to need to be advanced in Python, you’re going to have to know some serious data science, and it’s probably not something you should be getting into unless you’ve really done some intense programming work in the past.

Dan Leszkowicz: Just on the symphony, we’ve talked a lot about different models are good for different things -maybe classification versus data extraction versus summarization – and yes, absolutely. And then part of that symphony is also sort of the checks and balances that we implement or that are implemented within the platform, verifying the result of a given prompter or a given result. There are different models that are better at verification for different types of tasks. I don’t necessarily suggest that an organization looking to start AI unit go through the effort of evaluating all of those potential models and trying to narrow it down. Like, we’ve abstracted that for users so that they don’t have to. They present a given task they want to accomplish in Pienso, and they don’t have to decide which model is best for it. That’s already decided by virtue of the platform and with the flow engineering that we’ve done. So, certain types of tasks and certain types of data, you will be guided to the appropriate model.

Jacob Ortega: That’s awesome. Then the big one. The big question that I’m hearing about every single day. Agentic process automation. How is this relevant, what does it mean, what are we hearing.

Vinny LaRocca: So, the agent stuff is really cool. I think that’s sort of the next big evolution in LLMs and where this is all going. I’m not aware of any products that are really doing it well right now. There are some, like adjacencies, some people that are calling it agent automation, and a couple different terms for it. But if you hear “agent,” that’s a dead giveaway that we’re kind of talking about the same thing. It’s not quite that. There’s going to be a similar conversation when we start getting close to AGI – there’s no good definition of it, so there’s going to be this big argument. Some people will say, “I did it,” and there’ll be other people who are like, “No, that’s not it.” It’s like, well, there really is no definition. But the general idea is, if you take it to its extreme, like what the end game of the agent stuff is, hey, I want you to take this so-and-so document that I manipulate every day, and I want you to change it like this, I want you to download this other thing, I want you to combine it into one document, put together a presentation, and I want you to email it out to this person, this person, this person, and it will just go do that. There’s a lot of adoption issues with that kind of thing, so like now you’re hooking it up to systems and you’re letting it actually go do and execute things. I think the stage before that probably looks something like the same architecture that like an Amazon Alexa runs off of, where you’re talking to it, it has some NLP understanding, it says, “okay, I understand what thing I want you to do,” it drives it into a bucket, and that bucket has a code-based automation. So, it’s not just like I describe something, it builds the code to go do it and then goes and executes it. It’s doing pre-plan things that I’ve said I want you to do these things, and it won’t execute outside of that. I think that’s going to be the first wave of it, and I think, from an adoption standpoint, that’ll be really straightforward to get people on because it’s not that different than like what we’re doing with automation today. Once you make that next leap into, like, true like AGI, letting it go through your systems, putting it on a server, allowing it access to a bunch of stuff like an employee, there’s a lot of security concerns and there’s a lot of adoption issues that we’re going to run into.

Jacob Ortega: Definitely makes sense. I think we’re just starting to consider right now. So, bringing it all the way back to the top: open-source versus closed-source models. Based on your guys’ experience and opinions, which is the better approach?

Dan Leszkowicz: So, this is the part of the webinar where I offend either one half or the other half of the people on the phone, but I don’t think I’ll do that. I will say honestly that there’s not a one- size fits-all answer for that. It depends on what you’re doing. I personally, in my way of working within Pienso, use both, but it depends on the task. We threw out some examples where a closed-source GPT-type agentic interface is super appropriate. I definitely use that every day for more discriminative tasks, tasks that need to have at scale with high precision, for automation purposes or analytics purposes. Those, we tend to gravitate toward open-source models. But it is very much a trade-off. I expect the space to continue to leapfrog as individual models improve. The last little bit I’ll mention is your question earlier around sort of agentic automation. I don’t know if this is intentional, but nice little teaser of our next session, which is agentic interaction with LLMs versus directive manipulation. Beyond the scope of the last 30 seconds here, but tune in next time.

Lydonia Industry Focus →

industry Focus →

Lydonia Solutions →

Lydonia Services →

Lydonia Resources

Resources

About Lydonia

Lydonia Industry Focus →

industry Focus →

Lydonia Solutions →

Lydonia Services →

Lydonia Resources

Resources

About Lydonia

Lydonia Industry Focus →

industry Focus →

Lydonia Solutions →

Lydonia Services →

Lydonia Resources

Resources

About Lydonia

Subscribe Now

Open-Source vs. Closed-Source LLMs: Navigating the Tradeoffs for Your Enterprise

Follow Us

Related Videos