Akshay Sura - Partner
30 Aug 2020
In this video, I am going to go through how we tried to solve the issue of migrating assets from the Sitecore Content Management System’s Media Library to Sitecore Content Hub Digital Asset Management (DAM).
Note: The following is the transcription of the video produced by an automated transcription system.
Hey, guys, this is Akshay Sura, today we will be talking about migrating your existing Sitecore Content Management System’s Media Library to Content Hub DAM Digital Asset Management. Now, the reason why this is so crucial is a lot of the customers we are seeing a move from existing Sitecore CMS, where they have a ton of media assets. We’re talking about gigs and gigs of PDFs, images, videos, and these need to be somehow ported over to the DAM. And it’s not an easy way to do it. So, if you know, then there’s a process to it. Right. So, in order for us to even start doing this, check out one of the videos we had recently about integrating DAM into the Sitecore CMS again, it was by far the easiest integration I’ve ever done. It was pretty straightforward. Once you do that, all new associations with your images, you know that they are in the image fields or in the rich text editor. Anywhere where you choose media from will start pointing to Content Hub DAM, which is great.
But then we still have to go through these existing media libraries, library items in order to get them to them. This is just a screenshot of one of the plain, nine three instances, not a lot there. But just to give you an idea from the media library perspective Sitecore CMS can get pretty complicated depending on what you do with it. Right. So, for instance, we are talking about, you know, how you version media and even items because now we’re dealing with the references for media items in your regular items. If you’re multilingual, that’s an added layer of complexity any customization, as you’ve done with media fields. Any customizations you’ve done with items that select media fields shared fields, which are linking to media items. So it gets a little bit complicated in terms of how you version languages and how your version, you know, items inside, inside a specific language as well. So you have to almost cater to the solution to your situation or the current customer you’re working with. But this is a good start. So, in terms of importing into Content Hub, there are different ways of importing assets into Content Hub. One of the ways which are recommended is to import by Excel. So basically, you have this XML schema where you have different attributes for an asset instead of content based on your domain model for DAM, for that asset. But essentially, you’re filling out that role with all the metadata. You are also specifying a public URL. And then that would let you know, once you upload that Excel file, that would let you, you know, Content Hub will pick that up, add those assets, and then use those public URLs to pull things. But again, we’re dealing with a humongous media library with existing assets that are currently used on the public side. We need our existing content to change and the content references to those media items to a lot of things you need to do the other way that you can upload files. So, if you can drag and drop files, great. If you want to do drag and drop zip files and then have them extract again in the image processing section, you could add additional steps to do that. But then again, the problem is that you’re dealing with gigs of this data and this is browser-based. And imagine uploading humungous zip files or even if you break them up, just the amount of work that is involved in it the other way, you know, which uses the Excel part uses this, which is, you know, you could paste all the public URLs for your media items. Again, when we’re talking about thousands and hundreds of thousands of media items. This is not really feasible. And when you’re looking at it from an API perspective, unfortunately, there isn’t any there is no way of uploading a media asset as a raw file. So that’s out of the question. The only way you could do it is you could do it API based. You could create a bunch of things. But then at the end of it, you’ll fall back into this, which is you have to give a public URL for content to consume it. So essentially, we’re looking. From a content management perspective, we have a ton of content in the media library. We have a ton of content in our content tree, which links to those media library items in multiple ways. So, the image field is one, rich text is one this. I mean, any customizations you’ve done custom fields to have done? So if you take an image, you know, an image field, the first part you see up there is what you usually get out of the box for if you link to a media item. Right. For an image. And then at the bottom, there’s an example of what you get back from. If you had the DAM integration for content and if you selected a public, you created a public link. And again, remember, when you are selecting a media item from CMS or even in Content Hub when you’re trying to create a public link for it, the public link is unique to the different sets of conditions. So, for instance, I have a media of my logo. I need it in this specific, you know, height and width, for instance, which is 200 by 400, that will be a public link. I need another one, which is twelve hundred by seven hundred, for instance. That’s at the same exact asset, but it’s a second publicly. Both of them are different.
So if you look at it, you have the asset id you have the thumbnail and then you have the source, you have the type of ALT which is unfortunately just the name of the raw file, which you first uploaded the height and weight. And we’ll talk about this, where we ran into an issue where the ALT necessarily doesn’t come from a meaningful attribute instead of Content Hub. So, we need to extend that to add an Alt attribute and then pull the value from there and modify the existing integration for them inside of the CMS system Sitecore CMS. So, we’ll get to that in a second. I just wanted to quickly go through this. So, from a rich text perspective, when you look at it, you see this complexity is right. So, you get this is just an example of an image. But when you’re talking about a PDF, for instance, you will have an anchor link which will have a link attribute and then it will, you know, pull the ID and then assign the link at runtime, do that. So that’s another case we need to capture. But I just wanted to quickly go through for an image. You can have an Alt you can have a source with and height with the way currently it is and by default the base configuration for the Sitecore CMS. But from a content perspective, when we add a link instead of a rich text, now you see the ALT, you just see the source. And pretty much as I mentioned, the source, which points to a public link inside of content, defined what height width they are in different rendering options that public link carries. So, if you have custom fields that deal with media items, not just images, you would have to tweak those a little bit in order to replace anything you’re using. So basically, at the end of it, it’s fine that and then find whatever the replacement is in the text and we need to replace that. So, it’s something which you need to work on. And again, the Alt tag is a big issue out of the box, so you need to implement customizations to fix them. What’s the ideal solution? Right. So that in my mind, the ideal solution is, hey, here’s my Sitecore Web database, which has all the published assets instead of my media library. Look through it, pick all of them up, and then somehow magically move all of them into Content Hub. And then once you move them to content, create the public links for each of those assets, and then come back and then go through my content in the Sitecore master database and update those references. So, find the references for the media items, update those references with whatever the public links in Content Hub. Right. And at the same time, in Content Hub, I need to be able to find these pretty easily. So maybe have an asset type call type costumes and have the tags which represent the hierarchy of, you know, of where the asset is. So existing content authors in the Sitecore CMS can easily find these items, which used to be in the Sitecore CMS but now magically appeared in content.
That’s the that is what we are what would be my ideal solution. And obviously, for each of these each of the customers or the solution, you know, the problems you’re dealing with, you might have to tweak this a little, but that’s the base of all of it. Now, how did we try to achieve it? So, we. Want to look at it a little bit differently, so I’m going to go off the bat and let you know that what we tried to do is try to be minimally invasive. What does that mean? What that means to me is I’m not trying to add a layer of, hey, you need to do a code deployment. You need to open these ports to your production CM. You need to do this. You need to do that. We’re trying to be as friendly to a situation, a customer an organization, and make their lives easy. So, you know, if we have a copy of the production database, there’s a content freeze that’s on your local and you’re trying to run it. Great. If you’re running against a production CM is great. If you’re running against a stage, it would work. So, we try to make it in such a way that when you’re crossing boundaries, we’re trying to make it as least intrusive as possible so that you are able to do these processes. So, the first process is run on the Sitecore on the Web database. So, we have an ASPX page that doesn’t need code compiling. You would run it locally if you need to. If you have a copy of the production database, all it’s going to do, it’s going to take all of the information from the Web database. It’s going to pick up things like the item Id need to remember. We have to come back and update these things. So, item Id the Alt the URL, it’s going to come up with the public URL it’s going to package step in a JSON and then it’s going to push it to the Azure queue. Why an Azure Queue Why can’t you run it on the local .NET Core? Very good question. But what you’re going to not get from a local dot net core app, even if you do a multi-threading and you want to tax your current system is to be able to process this at scale.
I’m not talking about ten, twenty-two hundred. I’m talking about thousands of these requests going through your queue. So if you have a beefier machine, you want to run it on an .Netcore app locally to do the processing, you know, good for you. But what that involves is us making several requests into Content Hub right? So we are trying to say, hey, is there an asset type called Sitecore CMS, if there is and create one, here’s an asset I want to create for the image, say, logo, dot, SVG, for instance, create the asset. Tell me what the asset is. Oh, by the way, this asset has these taxonomies, it has this all text. And then once the asset is done like, OK, now run the Fed job and this is the public URL. Once the job is run and it pulls the asset instead of Content Hub, then once the fetch job is complete, which might take as long as how however big your asset is. So, if something like a logo.svg might be a few hundred K boom a few seconds. But if you’re pushing a PDF, which is 10 meg, 20 meg, 30 meg whatever that is, the fetch up is going to have to sit there and wait 30 seconds. Check again. 30 seconds. Check again. 30 seconds. Check again. So, there’s a process involved. So yeah, you, you fetch, you wait, and then if you’re done, then after that you say, hey, generate a public URL for this and let me know what it is. Right. And then you also have to take into consideration your Content Hub instance is also performing all of these jobs. So, there’s no guarantee that I send a request. It’s done in a millisecond. No, you have to sit there, and you have to wait for these things to complete. So, I really think that Azure functions give you a very good way of having this layer where you don’t know when and how long you’re waiting for. But at the end of it, once you know, all of the processing is done for, you know, single asset, it’ll get pushed back, and then it’ll be in the completed queue. And again, we’re using queues for everything, everything. So, it traverses through queues to create the asset type, create the asset, then add the tax to the taxonomy. Did you add the all tag? Fetch the media. Wait, wait, wait, wait. Is the media done? Yes. And then you pick that up and you say, OK, the general public URL, is it done? Once a public URL is done, throw it into the completed queue. Coming to step three, we basically pull those completed queue messages for the assets, go back into Sitecore and then say, hey, you know what, in the master, find me an item, find me this media item, find all the references. So, you go to the links database, get the referrers, you know which version of the item you need to modify. You know which item, where the sources for this media item you go, and you update it. If it’s an image field what to do. It’s a rich text field again find and replace if you have custom fields, you would have to deal with them accordingly. So it kind of gets interesting and as well as complex, depending on your item versioning, your language versioning your customization. So it’s a little bit involved. But today we’ll try to show you the best we can how these were updated, setting the scene.
As you can see, we have our Content Hub assets. Currently, there are only four on the right. What do you see is that I have a functions app running locally, but it’s looking at the queues which are on Azure just as mainly for debugging purposes. And to show you the progress, if I run it on Azure functions itself, will be kind of hard for us to follow through here. We have a CMS system. What we’re looking for is this, asset currently looks like it is pointing to the media item, which we are going to push to Content Hub, and we want to see it transform when we do come back. So again, no smoke and mirrors. Everything’s working the way it’s supposed to. What we will do is we will try to push these messages about ten of them and then see how this transforms and how we get it back again. Know right now we only have four, but we’ll run through these. Yeah, and then we have our .NET Core App on the left, which is responsible for pulling things and pushing things, so let’s go ahead and push. So if I push the messages is what you should see as we push 10 messages. We got the messages on the right. We’re going to go through a lot of process of pushing messages from one queue to another and essentially processing everything. So as you can see now that the DAM queue is zero and as we are processing more and more and more and things, we’re getting public links and things go to the created queue. We should start seeing messages in the completed queue. But right now, what will happen is the fetching takes the longest, and sitting and waiting those 30 seconds or 15 seconds takes a long time. So, I’m going to pause here and come back when we start seeing the messages. As you can see, a lot more processing, getting public links and you can see a couple of the links created. So now we have five, we’ll wait for ten, and then slowly the process on the client and we’ll pick up once we finish all the batch and then we get back all the completed messages for the batch we sent out. So, it looks like it’s already processing on the client. And so, as we’re getting messages, it’s pushing those updates into the Sitecore CMS and the message count should drop. If we get back in here, refresh, we should see a ton more of the images which came in right on here. So, if you look at one of these images. And then there are other assets, looks like pdf’s and things of that sort. So, we got our assets. You can see the base. We tagged it with the type Sitecore CMS. We have the Alt text which we pulled from the media library. We have a custom attribute. We updated that with what we had in the media library. For the media item. We have this CMS tag. So here what we’ve done is we’ve actually tagged it with the path of the media item so that content authors now doing work inside of Content Hub can easily find this. So, these CMS tags and the Alt tag there and the types, make it easy for you to find it in search. So, we thought it would make sense for us to do that. And you can see those tags in here. And then as you also notice, we have the public link. If I copy this link, open it up. You have the public link for that. And as I mentioned, every public link has a bunch of attributes. So, if I picked a different rendition, if I resize to a specific output, if I crop it, the public link you’re going to generate is going to be different. So, on the same exact asset, you could have multiple public links.
So now let’s go back and look at what has happened on our CMS.So, we are hoping that the link itself changed on there. So, let’s go take a look, come back in here and take a look. It got replaced with the content of an image tag. We have the asset id, we have the source and we have the type, which is an image. And then the Alt which is Alt is my new best friend. And it’s pretty simple. So, the same exact principle you can apply to Rich Text Field to find and replace. Of course, the rich text will get a little bit complicated. You could either use the agility pack or regex to find and replace either links as an anchor tag or images. As you know, links to the media library items. But once you run through this at scale, you will have a really good piece of content which points to Content Hub now instead of your media library. And at some point, you could remove your, you know, reduce the size of your media library. And if you have any questions, feel free to reach out to us, we’re more than happy to help you out.
Thank you so much for watching this video. If you need help in Content Hub or Sitecore, we live and breathe Sitecore. So just reach out to us on public channels. Here’s some of our information. If you want to get in touch with us, follow us on LinkedIn, Twitter, and YouTube, we produce quite a bit of content. So hopefully you’ll be able to follow us on there and keep up to date with what’s happening in the Sitecore world. Once again, thank you so much for watching this video.
If you have any questions, please get in touch with me. @akshaysura13 on Twitter or on Slack.
Follow us on Twitter Follow us on LinkedIn Follow us on YouTube
Akshay is a nine-time Sitecore MVP and a two-time Kontent.ai. In addition to his work as a solution architect, Akshay is also one of the founders of SUGCON North America 2015, SUGCON India 2018 & 2019, Unofficial Sitecore Training, and Sitecore Slack.
Akshay founded and continues to run the Sitecore Hackathon. As one of the founding partners of Konabos Consulting, Akshay will continue to work with clients to lead projects and mentor their existing teams.