Using a seperate SQL DB to create content pages, should I do it? Suggestions needed on interlinking as well

I am building a gaming website and am encountering a minor issue. Some of the data includes information on items, shops, armor, quests, and other relevant details. I have a custom program that reads the game files and extracts the data into a SQL database so I can easily tie everything together, but now I need to get this into Umbraco so I can display it.

Originally, I was going to use CMSImport to import it all into Umbraco and create a Content node for everything, but with this easily topping 60k pages and there being complex relationships between the pages that CMSImport can’t handle, I was thinking it might be easier if I just wired Umbraco up to my SQL Database to pull information from and then somehow display it.

I am leaning towards the separate DB option mainly due to the problem of actually importing that much data, moving it around, as well as having to manually go through all the pages to handle the complex linking. However, one significant problem this creates is how to handle content for those pages that would not be stored in the separate DB. Like each content page has “universal” properties that cover SEO things and a few other properties, as well as data I can’t scrape (like a full walkthrough for a quest).

I was thinking of like a hybrid approach as well, create a basic content node for each entry that has the non-data scraped properties and then code the template to read other data from the other DB but then that means I am still creating so many content nodes and having to move them all around.

Another option I was thinking was using the Umbraco API to manually import everything that way I know all the relations and other data that CMSImport can’t handle is there, but that still presents the problem of having such a huge site and moving things around.

I say having all that content and moving it around is a problem because I did a test import of just 5200 documents containing just the name of all the quests. While the import was fine uSync was not able to do a push of that much data at one time and when I did an export/import of the data it took something like 40 minutes just to import it.

So while I have ideas and a few approach options I am not sure what might be the right way to approach this and am looking for thoughts and ideas. I do know umbraco sells Umbraco UI Builder, which would probibily work, but I can’t spend 4k on that. 4k is about 2 months income for me so a bit out of my price range.

My personal rule of thumb: if it’s not content that needs to be editable from Umbraco then don’t import it as nodes. Nodes are large objects with all kinds of metadata, if you don’t need the metadata then don’t import it.

If you still want to be able to edit the content, but don’t need features like scheduled publishing, workflows, etc. then you can make it all editable with Umbraco UI builder, which will give you editing capabilities right from the Umbraco backoffice:

Didn’t read your last paragraph, UI builder does have a free option for a single collection if you can get away with that. Otherwise you’re going to have to build your own I guess. I will say that importing them as nodes is probably not going to give you the best experience and you’ll be battling caching and performance problems at some point

So if I am not importing as nodes how do I deal with the parts that do needed to be edited from ubmraco like the SEO properties and walkthrough and other things. As well as deal with interlinking from one page to the other?

Or is it time to seriosly look at moving away from umbraco to some other software sweet? I once tried to program the site my own but I am not good nough to handle security and an admin interface. I chose Umbraco because of the backend interface it gave my content writes, an easy to use place to write there guides and news articles and such. But then there is this side of the site that is more static, but could change later at the will of the developers (and has had changes in past patches).

By all means, please do what you can to import all the data as nodes, it should work, it just seems like overkill for what you’re trying to do!

You can interlink to UI Builder elements I believe, not sure how that works :slight_smile:

To be fair, your use case is a little bit of an edge case, which is not quite what Umbraco was built for. I wouldn’t really think this is something any other system is particularly good at either.

If a node is created for each item do you have suggestions on how to move that data around? uSync does it but it is really slow and it is my understanding this is due to each node being imported kicking off an examine index of the page slowing things down. Is there like a way to temp disable exame indexing while the import is happening and then turn it back on after? I am mostly concerned with getting the data into the production site. One though I had was sync all data from prod to staging, disabled exame in staging, import, then do a copy/restor of the DB to prod but I don’t know how well that would work especially with some production settings being different from staging.

It’s not going to be faster than what usync can do.

Hard to give specific recommendations, but in general I would recommend:

  • Do it on staging, test thoroughly, stress test it, etc.
  • Then take a backup of the live database and repeat the import process to make sure it will all work
  • Thoroughly test all functionality to see if it all works
  • Put live site in maintenance mode (personally I would set up a temporary server that only has a “please hold while we perform maintenance” message)
    • Run the import process and test again before you switch the live site back on

And of course, take backups of everything and make a plan to roll back (and test your rollback plan) before you do anything.

Is there a way I could temporarily disable exam indexing during the import? Kevin thinks that might help speed things up a bit. While I never tested in production I did do a test on staging of doing a usync export/import of the base quest information (about 5200 records) and besides it taking about 40 minutes the only problem I ran into was filling up the SQL transaction log but that did not stop the import from completing.

I can’t recommend it. The more weird things you try to do, the more things might break :sweat_smile:

Ok, looks like I am kind of stuck between a rock, another rock, and a hard place with this project.

Do you think having my application write directly to the production site via API calls (after deep testing on local and staging) might be faster then using uSync for this case, or would I most likly hit the same slowdown issues from examine and HTTP requests?

uSync’s developer (Kevin) has over a decade of experience with importing things into Umbraco and making it consistent and bug-free and at the best performance. As I noted earlier:

This is part of what makes it slow. I really don’t know how examine effects the performance but I did see this article that might help:

The last thing I really want you to think about is: which if these content items are you realistically ACTUALLY going to update the content for. I don’t know at all what you’re building but I always doubt that people actually are actively updating 60.000 content nodes..

I am in no way knocking what Kevin has done. I love uSync and have been using it for a few years. Was just not sure if a direct API method might be faster or not. I don’t know umbraco well so it was just a thought as I try to figure this all out

Kevin is using direct API methods :wink:

For these nodes updates would rarly happen once all the data is set. I do know there might be times in the feature where they need to be updated, like at one point the game removed all of a certain type of equipment, and I might have to do some change to the SEO properties. But once the data is imported into my seperate DB 99.9% of changes would be auto-picked up by my application and the changes made to my DB so the updates and updates would be made via what ever method I end up using to import the data.

This conundrum if a large amount of data that will not often get updated is what lead to the post. Going back on my first post I see I forgot to include this bit of information. So I appologize for that oversite.

I had the feeling he was using the API but not 100% sure on that. Thanks also for the link to disable Examine. And this is where I show also how little I know, when you say “nodes are large objects with all kind of metadata” I really don’t know what that means. Like is a single node stored as a LOB in the SQL DB?

I’m not sure what an LOB is..

An Umbraco node consists of that node, in a different table for each property on the node there’s a row, there’s possibly additional relations rows, there’s editing history rows, and probably some other things I’m forgetting. This makes it also slower, it’s not just storing a single row in a single table, there’s quite a few joins that need to happen.

If you’re not going to edit most of the data then we’re back to me telling you the data should not be Umbraco. I can tell you’re trying to link a lot of things and I don’t know what that needs to look like but that could be properties on nodes, or actual SQL calls to your other database.

LOB is Large Object, basicly it is a single entry in a table that is so big it flows over the normal paging for SQL. What your describing is normal SQL design and not a LOB. But with what you are describing I understand what you are saying much better, I used to be a DBA so now I see what your getting at with “metadata”.

When I say link things I am talking about say you have a quest. Well that quest has items as rewards and NPCs involved so it would link to those. And those items and NPCs would link back to the quest from there pages. Items can be crafted or bought at stores so recipes would link to other items and stores would and items would link to stores as places to buy them, and so forth.

I understand it is possible to do a custom controller or something that would allow me to process URLs without nodes existing and that would let me just make custom razor pages with custom code to query the DB (Before I found Umbraco I was doing Razor MVC and doing it all this way) but my concern with linking though is making sure the links are always good links. Like if a page gets moved or if someone is writing/updateing a guide and wants to link to an item or NPC. Since there is no node for it Umbraco won’t have a picker for it or maintain the links if URLs change. I would also have no idea how to implement site searching. Right now the site search uses the built in indexes but if I have a bunch of data in a secondary DB I don’t know how I could get that into indexing and searching as well.

BTW, really apreacite your insight into all this and trying to both understand what I am doing and help me understand things too. I am not very good at getting my words out of my head and into text so I do apologies if the things I write are confusing or hard to understand.

Should you create content pages? It’s up to you. I didn’t.

I can relate a somewhat similar context and tell you what I did. We both have data that is in a separate system. In your case gaming data, in my case environmental data (rainfall, wind speed etc from various sites). My solution was to create a set of APIs that is completely separate from Umbraco that works with the data and javascript that can be referenced within Umbraco (or any other website).

This of course has some problems that I needed to resolve - the type of problems that you are trying to solve by converting data into pages.

  1. Each page has a separate URL
  2. Each URL should be able to be bookmarked
  3. Each URL should have a title relating to the content
  4. Search engines should be able to access the available set of URL

Solutions:

  1. Query parameters are used for every selection
    e.g. site ?module=rivers&collection=waterquality&site=awanui&measurement=temperature
  2. Display what the URL specifies as well as setting the URL depending on what is selected
  3. Use javascript to set an appropriate title
  4. This is a bit harder. I have a scheduled task (using Hangfire) that runs periodically to populate the sitemap. This includes all of the URLs down to the level of measurement in terms of the environmental data.