Hire a web Developer and Designer to upgrade and boost your online presence with cutting edge Technologies

Thursday, December 8, 2022

ESX Clusterbombed FOG Update

 

BIG problem – our ESX cluster has fallen over. Kind of. Not entirely – but enough to be a complete pain.

The symptom – our student webserver called ‘studentnet’ (which also hosts lots of student storage) stopped working. Actually, for a while I suspected something may happen, because we were observing very high memory usage (95% memory usage on a server with ~132GB of memory?!) which was all being used up by studentnet (with high CPU usage spikes too). A student may have been running something that went nuts – but regardless, I went to log into the VM client to check what was happening. This is where it all started to go really wrong.

The vCentre host was down. But the internal wiki site we host was up. Really weird stuff. I tried to log into the individual hosts that run the VMs and could get in fine; although studentnet just wasn’t responding at all. At some point – stupidly – I decided to restart the host machines, as one wasn’t appearing to respond (it turns out later that it just was just active directory permissions that either were never set up or have somehow – at the time of writing – died a horrible death).

This was stupid because, mostly, it was un-necessary – but crucially it meant all of the VMs that *were* running were now turned off. Very bad news in one or two cases – as I will try and cover later (when I talk about turning certain servers back on) – because any servers that were running in memory now have to reload from storage.

And the storage is where the problem lies.

When my previous boss left in December, I asked him what we should do to move forward; what thing we should try and carry on working on if he was to have stayed around. He was quite focused in saying that storage is the one thing that needs attention and a proper solution for because – well – we have no backups. We have told everyone who uses studentnet this – as we have also told them that it isn’t a primary storage location in the first place (and I have a feeling many were using it for this – as well as running files directly from their user area, potentially causing lots of constant I/O activity) – but that is really of little comfort to those who were needing to submit assignments last night.

For our two ESX hosts (Dell PowerEdge R815s which have 16 CPUs and 132GB RAM each) we are running a Dell MD3000i storage array with two MD1000 enclosures connected. This is probably about 7 years old now – and although my experience with storage management like this is limited, I have a feeling that we should be looking to have at least some sort of backup, if not replacement.

Our compact ESX cluser with storage nodes

Our compact ESX cluser with storage nodes

Nevertheless, performance has been ok – even if it has been sometimes intermittent. However, there is a “procedure” for re-initialising the array in the event of a power cut or if something just goes wrong – essentially disconnecting the head node (which is the MD3000i – 15 disks of slower, but big 1TB, SATA drives) and powering it off, before then power off the MD1000 enclosures (each has 15 disks of faster, but smaller 300GB, SAS drives). I should really read up more about uSATA and SAS drives – but for now, the main thing is to get things running again. Before doing this, you should really power off all VMs as this stops I/O activity – but the VM will still keep running (just without the ability to write back to disk) even if you disconnect the hosts from the storage array.

I realised after the hosts had been restarted (as I had stupidly done earlier) where the problem lay. On both hosts, several VMs failed to even be recognised – including Studentnet and frustratingly, the vCentre machine (seriously, why is this running as a virtual machine stored on the flakey storage array!!). And these happen to all be stored on our MD1000 enclosure 2.

The storage enclosures

The storage enclosures

Going back to the server room, I had a look at the lights on the disks. Several orange lights. That was probably bad. It was very bad. You probably shouldn’t do this, but I simply removed and replaced back into position the disks that had orange lights on – and several of them actually went green again. I thought that might be an improvement.

This doesn't bode well...

This doesn’t bode well…

And actually, it was to some degree. The first enclosure was now orange-light-free – with the main unit’s LED now showing blue and not orange. I thought this might be good. But opening Dell’s Modular Disk Storage Manager tool (useful, but it should be more visual and show more log information. Or maybe I can’t find it), showed me the true extent of the problem;

fail1

Yeah, things are pretty bad. FFD2 (the second fast enclosure) has failed. In fact, drives 0, 2, 4, 7 and 13 (which was a spare) have all failed.

fail2 fail3

The strange thing is, though, is that even with the current replacement mapping, RAID 6 should allow for 2 failures and still be able to operate. I would have thought that means you can have six failures on one of the arrays, using four hotspares to rebuild data and accepting two drives as total failures. But again – my experience of storage management is limited and I have no clue how hotspares are actually used within a RAID scenario.

Taking out the dead drives..

Taking out the dead drives..

The next thing I’m off to try is to restart the arrays again. This needs to be really sorted as soon as possible and the total loss of that enclosure is just not something that can happen. If hot spares fail, i’ll replace them with disks from another machine that also has 300GB 10k SAS drives in (what can go wrong?) one at a time and see what happens.

Whatever happens after this, a key thing we need to look into is buying more storage, as well as the backup and maintenance for it. And, for now, FOG has to be on hold too.

Looks like the storage array is dead. After a bunch of messing around a couple of days ago, it really is apparent that we have lost the FFD2 enclosure.

With it, we lose a few server, but we can also gain a load of storage disks. Until we manage come up with a new storage solution that has backups, I’m taking all the old SAS drives out to use as hotspares for FFD1. I have a feeling we have lots of 15k RPM SAS drives lying around that were used in other servers – more recently – and of the same brand to use to rebuild FFD2 again. It should work. If not, it’ll be a learning experience.

I ended up familiarising myself a lot with the Dell Modular Storage Manager (DMSM) software and found out the correct way to assign hotspares and replace drives in the enclosure. A lot of messing around, unplugging and restarting took place on Tuesday, eventually resulting in a hot spare being designated as a physical replacement on another enclosure. I had actually written a good amount up about this but it was being written on notepad on a virtual machine that subsequently got restarted when – at some unknown point – it was decided to restart everything that was already running. Frustrating. But not the end of the world.

Moving forward, what needs to be done now is:

  • Have a backup solution:
    • If a server fails, if the hosts fail and if everything is lost, we need to be able to – at worse – rebuild and reconfigure those servers. Each server should have a designated backup plan associated with it
    • Designate some replacement hot spare drives.
    • Purchase a new storage array and an appropriate backup, with perhaps something like a scheduled daily backup of the system.
    • Ideally the content from our internal wiki should be mirrored elsewhere so thatin the event of a disaster, we can recap on how to fix it.
  • Maintain the storage array and the ESX hosts more closely. Someone needs to monitor alarms as they appear and be informed of any storage array issues. I also need to look into why we no longer receive support emails automatically generated by alarms on the storage array (and this used to happen).
  • Rebuild the vCentre server – probably on a physical host rather than a virtual one. Will need to look into that.

For each of these points, I would probably make a new post – but this is just one part of what I am working with. FOG and the redeployment of our labs is also a priority, as are some other projects I have been working on lately. To be continued!

Last week was spent trying to get our ESX cluster back up and working, so now its back onto FOG. Towards the end of the week, I did manage to spend some time on this again. I changed our switch configurations for the three rooms we manage the network of ourselves to point from pxelinux.0 to undionly.kpxe, which now uses iPXE (a bit better and can use other protocols rather than tftp, such as http). Whether this provides any speed differences remains to be seen.

Host registration

For our own rooms, this small change actually worked and the following screen became visible for a new, unregistered host.

failboot5 The timeout option for the menu can be changed from the FOG management webpage – for us it is 3 seconds but, after registration has been performed, I will likely reduce it to 0. I am pretty sure that, also, the timeout can be altered to depend on the host’s registration status.

I spent some time working with the rest of the team here walking through the procedure for registering new host machines (since I decided to not bother with exporting and importing a saved list from the previous FOG installation) and, with only five of us left in the team, it was important that we all know how do use FOG in case one of us isn’t here. The registration of ~500 PCs will be a monumental task, but with some tips and tricks, it shouldn’t take too long. When registering a host, all we really need to do is to give it a name – the rest can be edited (including the name, to be fair) in the FOG management menu. A quick way to do this is to just enter the name, hold down the enter key and move onto the next host. Because I haven’t defined any groups right now for the hosts to go into, I can manually add them later – however, it may be a good idea to modify the registration options to not only strip out a load of the useless options but to also extract the group to add the PC into from the host names (as each PC is numbered according to the room it is in).

failboot7

One thing to add here is that if your organisation, like ours, uses asset tags on their systems, this may have also been recorded by the OEM onto the motherboard. If this is the case, the asset tag (which, for example, DELL would provide for their systems) will be uploaded with the rest of the hardware inventory and can be viewed in the FOG webpage management under each host’s hardware information. When it comes to auditing your hardware, this can be very handy (as it was when we once had to record the MAC addresses for every new PC we had ordered, once – presumably someone had forgotten to do this at an earlier stage before their arrival with us!)

failboot6

And here we have a fully registered host! If you get the name wrong (as will inevitably happen in the process of manually adding so many hosts), you can actually delete the host using “quick removal” here, which then takes you back to this menu again.

Bootfiles – Pxelinux, Syslinux and Undionly

Now to try out the other labs! Upon boot, this happens:

failboot1

As suspected, this didn’t work on the rest of the rooms we manage, unfortunately. After hanging for a while on PXE booting any of the computers in the labs, the machines time-out saying “PXE E-53: No boot filename received.” This can be from a few causes, but generally it is because the PXE server didn’t respond or that there is no boot file specified even if the server is able to be contacted.

Or, now that we have changed to undionly.kpxe, perhaps the bootfile specified in DHCP option 67 is incorrect. FOG now uses undionly.kpxe as its bootfile. I was a bit confused by what this was, so I’ve been looking around a bit and this article answers it through part of its explanation of PXELinux. It seems that Etherboot’s undionly.kpxe and Syslinux’s pxelinux.0 are combined in the article’s scenario, as they both serve different purposes, but FOG has replaced the latter with the former rather than using both?

I decided to actually check the FOG site out. It explains it quite well and, through a link to an Etherboot blog, it seems that pxelinux.0 IS still used, but that it has been moved to a different stage of the loading chain. Its generated from .kkpxe files, and the undionly.kpxe file is used as a kind of generic PXE loader. The key thing to note is that (and this post by Tom Elliott* back in February details some of the motivations too) iPXE can use different methods of delivery, rather than just tftp – and apparently this can make things faster if done through http (as well as being able to use some cool php magic and things too). *Tom now appears to be one of the main FOG people as, after the change from 0.32, he is listed on the FOG credits as the third creator. 

My assumption initially was that, because we can only manage the DHCP pools for three rooms, the rest of the labs’ DHCP pools were unmodifiable by us and, therefore, need to be changed by ICT services.

However, the only thing that had to be changed, ever, on the rest of the University network was that, on the core switches, for each VLAN that we wanted FOG to work on, we needed the ip-helper address to be set. But this hadn’t changed at all – so I couldn’t work out what the issue would be…

proxyDHCP

Then I remembered something – we had to actually configure FOG as a proxyDHCP server. It isn’t that way by default. For this to work, we can use dnsmasq – which is a simple install and adding of a configuration file called ltsp.conf into the /etc/dnsmasq.d directory. Here, certain options are configured to actually enable the proxyDHCP server. The example configuration is commented, so I won’t detail it here. However, a few things to note:

  • Each IP address listed represents the network for which the proxyDHCP server will respond to requests from – without listing them, the FOG server won’t respond to any requests from those subnets.
  • You can subnet it however you like – so we could do 10.0.0.1 255,255,255.0 and get the whole University – but only the subnets that the University network had configured the IP helper address on would be able to get the FOG booting on anyway, so I decided we should probably list each subnet (and be able to disable each subnet) as we wanted FOG booting to be used on.
  • After you add a new subnet for FOG to serve, after saving and exiting the configuration, you should do a” service restart dns-masq”

So in order for this to all work in an environment where you have no access to the DHCP configurations, the following had to be configured:

  • iphelper address of the proxyDHCP/fog server had to be included on the core switch, where vlans are specified
  • ltsp.conf had to be configured on the fog server running dnsmasq

However, this didn’t help at all.

failboot2

This turned out to be because, of course, pxelinux.0 is no longer used and the FOG wiki instructs you to change a couple of lines to point to undionly.kpxe

This line:

dhcp-boot=pxelinux.0

is now,

dhcp-boot=undionly.kpxe,,x.x.x.x

Where x.x.x.x points to the FOG IP. Note, that the IP is necessary as, otherwise, you get this error:

failboot3

and the line:

pxe-service=X86PC, "Boot from network", pxelinux

is now

pxe-service=X86PC, "Boot from network", undionly

I saved, restarted and now, finally, it works!

failboot4

But why did it work on our rooms?

As I remember from before, our labs that we manage (three rooms) are served by a stack of Cisco switches where we could add next-server and bootfile. But the rest of the University uses Windows DHCP servers and they never configured options 66 and 67 for us, ever. So why were our rooms able to PXE-boot, by configuring the options 66 and 67? It seems that by having our single DHCP pool include all the details for the FOG server, this will allow it to explicitly point to the FOG server and explicitly include the file name to get. Because the tftp boot folder has been setup already in FOG, the request for the file will be directed to the folder. However, this wouldn’t normally happen across the rest of the University network as the DHCP servers don’t point to our tftp boot server at all. Even when the ip helper address is used it still didn’t work – because the proxyDHCP service wasn’t running (and therefore it wouldn’t respond to any DHCP requests). This is why dnsmasq was used – to start a DHCP service on the FOG system, but without actually giving out any IP addresses.

So if this worked originally for all of the subnets that we configured in ltsp.conf, why couldn’t we just configure it for our own labs? The IP ranges were there, yet they weren’t serving the labs where we maintain all of the configuration for. I will update this post later after looking for a possible original misconfiguration.

Next time: I will try and upload and download a FOG image, with attention to ease of use, speed and how it compares to my experiences with 0.32.

With FOG registration tested and verified to be working, its time to move onto actually testing image uploading and downloading. If that doesn’t work, its game over. For Part 5, I will deal with how to upload an image to FOG and the process that I take to do this from scratch.

Creating and uploading an image using FOG

There are several useful links that are on the FOG wiki already, which outline the main steps:

To facilitate all of this, a dedicated Dell 2950 server is running ESX 5.1 so that we can create a virtual machine to emulate our default build.

esx

 

Why the DHCP server?

The DHCP server runs purely to give out an address to the single virtual machine we have running. This is because within our server room, all servers are configured to have static addresses and therefore, DHCP isn’t needed.

Except that in order to boot from FOG, you need DHCP to be running (and access to a proxyDHCP service). So this server will simply deal out a single address to one client of a specific MAC address – that stops anyone else being able to get given the address accidentally and it means that now we can boot from FOG on our virtual machine.

Why use a virtual machine?

During the sysprep process, all hardware information can be stripped out using the /generalize switch, so the platform and hardware is largely irrelevant. However, the advantage to using a virtual machine is that, not only can it be accessed remotely, but it also can have its state saved in snapshots.

This makes it easy to roll back in time to an earlier version where, say, an older version of software is used and, crucially, after sysprep has been performed, the image can continue being edited later as if the sysprep process has never been even touched.

The Process

For anyone thinking of doing the same, I would suggest reading the above FOG guides as they get to the point and give you pretty much all the steps you need. But here’s how I did – and still – do it myself.

I started with a clean Windows 7 build, although at the stage where you actually enter any details, you can enter audit mode by pressing Ctrl+Shift+F3. While in audit mode, you are modifying the default profile (although the user folder is called Administrator), so everything you do gets replicated to all users.

Note that this can be an issue in some cases. For example, I found out that Opnet writes files to /Appdata/Local and explicitly saves the user as Administrator. This has to be manually changed, otherwise all subsequent profiles will try and access the Administrator user area. Similarly, if Eclipse is launched, it will save the user in preferences as Administrator, meaning that any subsequent users are unable to launch Eclipse at all. I will make a separate post about this in future…

I install most of the software we used manually because, with a 1gbps network connection, a ~70GB system installation can take somewhere from 5 – 15 minutes to download onto a host machine, depending on how the intermediary switches are behaving. The alternative way to image systems is to install nothing after this base installation and, instead, use FOG’s Snapin functionality to remotely push packages out. However, it was felt that given the speed of multicasting, it can be far more efficient to restore a saved image of a system onto all PCs at once, rather than have each one manually pull down packages that may, or may not, individually fail.

At various points, windows updates are done, software is updated and changed and restarts are made. I found an issue once with a Windows Media Player Network service, which caused the sysprep process to fail, so although doing updates to Windows on the base image is fine, I make snapshots along the way. These are crucial, actually, and are the main motivation for using a virtual machine for this process. It means we can delay Windows activation until the end and we can undo huge volumes of changes within seconds, being able to roll back to a different point in time as necessary.

snaps

Of course, alongside this, I keep a changelog and note down what I installed, uninstalled and changed at each point in time. This is really important because uninstalling one package can have affects on others, especially if they use Visual C++ redistributable packages (one application may rely on a specific, yet old, version of the redistributable, as Broadcom seem to have made happen with some of their wireless/bluetooth software).

When ready to deploy, I then take a snapshot of the system as it currently is (ideally saving the contents of what is in memory) and install the Fog service. This is a cool client-side application that allows for host naming, Active Directory registration, printing, and snapins (among other thing) to happen, with the snapins being one of the best parts of FOG. Some packages, such as Unity, can be used for free only by individuals. Unity is used at our University, but only on ~50 of the PCs we own. Therefore, we could install it everywhere, but not legally. Snapins mean we can deploy Unity remotely to only specific PCs, or run commands on those PCs.

Finally, I use Sysprep. I used to use Fogprep, which made certain registry changes as far as I can tell to prepare the system to be imaged, but FOG now seems to do this for you during the image restore process as far as I can tell. Sysprep is a Microsoft tool for preparing the system for first use; the OOBE option presents the user with a first-use screen (like when you buy a new PC) and the “generalize” option removes all hardware information from the system, so that new hardware can be detected. For this to work with all hardware types, I specified in the registry (HKEY_LOCAL_MACHINE/Software/Microsoft/Windows/Current Version) the additional folders to search for drivers that I know the system needs (usually the extracted NVIDIA drivers and each motherboard driver type). Now, when Windows is first booted after this process has been run, it will scan for drivers for attached hardware.

Sysprep also can use an “answer” file to automate most of the process. So, for the OOBE, you can skip all of the language, region and even license key steps if you have all of this included in a sort of configuration file, usually named as “unattend.xml”. Back in Windows XP, this was mostly the same, but now the tool to create this file is included in the Windows Automated Installation Kit (WAIK). However, you can manually pick this file apart and make your own if you understand a bit about XML, so you can specify a command to activate Windows through KMS, specify locale options, choose to skip the rearm process (to stop Windows reactivating; it can only be reactivated 3 times ever!) and a range of other things.

The following command does the above and will shut Windows down afterwards.

C:WindowsSystem32sysprepsysprep.exe -OOBE -generalize -unattend:unattend.xml

Note that, from this point, starting the “sealed” machine will initiate everything and set up Windows as it would be, had you gone through the whole customisation process. For this reason, I make a snapshot right before initiating Sysprep – although if you are thick provisioning for your virtual machine, snapshots can get HUGE.

Now, to actually make the host upload its current image, the host has to be associated with an image. Whatever is selected as the associated image will replace anything that exists for that image already – so make sure you really want to overwrite what is already there! From the FOG management webpage, navigate to the host where you are uploading from under “Host management”, go to basic tasks and select “Upload”. upload

 

When you start this, the host – during PXE boot – will see that there is a job for it to do and it will then start uploading the contents of the drive to the FOG server. This is usually a very smooth process – so long as the virtual machine is set up to network boot (otherwise, you will just boot into Windows..).

I restarted the VM and got the following screen:

imaging

So far so good. I preferred the old look, to be honest, and the refresh rate of the information is every second, so it “feels” sluggish – but actually it shouldn’t be any different. However, initially I noticed that the speed was oddly very slow. In fact, it only climbed to about ~400MB/min – seems still really slow when you consider it should have a 1gbps link; its acting as though its only a 100mbps link! For comparison, it used to just touch 1000MB/min (and lab PCs could top 5000MB/min, which seems still slow, as I would reckon that makes each connection only use 600mbps – but accounting for switching, perhaps it is not too bad).

However, by default, the PIGZ compression rate is set to “9”, under the FOG settings. Changing it to 0 results in an upload speed that approached what I was used to seeing:

uploadFAST

However, this comes at a cost:

imgmamangement

imgmamangement2

So the space taken up on the server by the image is around 90GB – compression that would halve the upload speed and halve the upload size. Its a tradeoff for a quick image creation – and with over 1TB storage, I am happy to use no compression for now. However, in future, it’ll have to be maximum compression when using many images. Note: I tested out downloading a compressed versus uncompressed image with no speed difference at all – so there is no gain or change in speeds when downloading to clients. I’d be interested to find out more why the speeds vary so much and never hit their potential maximum!

Anyhow, this will take a few hours to upload but once this is done, the real test will be downloading to a host and multicasting! And while we are waiting, one of the new cool things in the FOG browser is being able to see the client speed and progress (which didn’t seem to work in 0.32 for me..)

Active Directory settings in FOG 1.0+

Back in FOG Update – Part 2, I said that you could just copy and paste settings from the old FOG to the new one. Except that, as of version 1.0, the username for the account that can add machines to Active Directory should NOT contain the domain. In other words: Domain/User should now just be User.

So that was why computers were no longer being joined to a domain after imaging had finished..

Further to the previous post, everything seems to have been a success. I wiped out list of hosts from the FOG database when I installed the server from scratch and so have been going around all of our PCs re-registering them all from the FOG host registration menu, using some sort of naming convention (<department><room>-<assettag>). As they are already all on Active Directory, the FOG script to join them (which initially didn’t work, see the previous post!) to AD sets them back into their groups again.

Speeds seem to be around 2.5GB/minute – which again still seems slow, however multicasting works as it did before, which is absolutely fine. We recently had some switch upgrades to Cisco 3850s, which should make all links 1gbps now. More testing of our other labs will take place over the coming weeks. But as far as FOG is concerned, this is likely to be the last post on FOG for a while (or, at least, it should be). The issue to cover now, will be snapins.

FOG versus Desktop Management software

Currently, our base image for Windows 7 has all of our software installed. This weighs in at around 90GB and means that, once deployed, a computer has everything it needs for every possible lesson taught in our labs. Additionally, once completed, every PC will be configured the same; with ZenWorks, our University would deploy a basic copy of Windows XP and the packages for a given lab would be added and downloaded.

The problems we faced were that we wanted to be able to use Windows 7, about 10% of all PCs would fail to image and – even the ones that did image – were inconsistent in which packages they had managed to receive. The University now uses LANDesk, Windows 7 and has better infrastructure, but our department still is using our own system – FOG – and we have been quite happy with the process that we have in place.

One problem with our method is that the image is big. Its HUGE. A very basic, cut-down Windows 7 image is a fraction of the size – but this means having to deploy all the software to each machine which, really, works out as no quicker (it will likely take longer, too, as a multicast of a 90GB image is just as quick for potentially every PC in our labs as a single PC – the alternative is to transmit the same packages individually to PCs). So this was our logic behind making a single large image. But aside from the upload and download times for the image, the real issue is changes that might be made to the systems; this means adding, removing or changing software.

Zenworks had a bunch of management features and LANDesk seems to have quite a number, too. FOG, on the other hand, seems to really focus on imaging and not much more; there are some things that are useful, such as remote initiation of system reimaging, Wake On Lan, remote virus scanning, memory testing, user login recording and system inventorying – but it isn’t really a management tool to the level LANDesk is, which is something we may have to address in future. However, FOG does have a service component present on all client machines that checks in with the FOG server every so often to check if it has any tasks active. This FOG client service has modules that will do things such as hostname changing and active domain registration, logging on and logging off, printer management and so on. This is expandable by simply adding our own modules that can be written in C# (so we could replicate lots of managment functionality if we could write it, for example).

However, the one really cool thing that I hadn’t really explored until now is the ability to deploy packages through “snapins”. Properly managed, snapins can accomplish a few things that we need to be able to do; remote deployment, removal and uninstallation of software installations on multiple PCs and being able to run commands remotely on systems. This means that we can now update software without having to redeploy an image or manually manage those workstations by hand (although the same changes we may make would still be replicated to the base image for updating future PCs).

Snapins with FOG

The first thing to note is, actually, I have used snapins before. However, they were just command-line .bat files which would essentially remotely initiate installers that were locally stored on PCs. One example is a Sophos package we have, which once installs, attempts to update itself. It should then restart a PC, but there needs to be a timer added. This batch file worked quite well. However, I couldn’t work out how to run an executable .MSI or .EXE by uploading it. This is where I then found this guide, which walks through how to make an snapin from start to finish using 7-ZIP SFX maker.

Essentially, however, the simplest explanation is (for SFX 3.3):

  • Have every file you want in a 7zip file archive (including your batch script or cmd file or whatever)
  • Open SFX maker (Probably stashed under Program Files (x86) folder) and drag and drop your .7z file onto the big empty space. Check the following:
    • Dialogs->General->Tick “Extract to temporary folder”
    • Dialogs->General->Tick “Delete SFX file after extraction”
    • Dialogs->ExtractPath->Untick “Allow user to change path”
  • Navigate to the “Tasks” tab, click the big plus sign, click “Run Program” and just add the file name of your batch script or cmd file at the end of whatever is in the %%T box (eg myfile.bat)
  • Click “Make SFX”

The file will be output the same directory as the .7z file. To add the new snapin, under FOG management, click the box at the top.

snapin1

However, one important thing to note before uploading the snapin is that, by default, the snapin file size limit is 8MB, as above. Editing /etc/php5/apache2/php.ini, I changed the following values to be:

memory_limit = 2000M (2GB memory usage limit for snapins)
post_max_size = 8000M 
upload_max_filesize = 8000M (8GB upload maximum!)

snapin2

This should give us no problems with any of the packages that are bigger than 8MB. Afterwards, the web server needs to be restarted with “sudo /etc/init.d/apache2 restart” (and make sure its M in the PHP file – don’t write MB, otherwise the php file gets upset!).

After uploading the snapin to FOG, you can assign it just like an image; either to an individual host or to multiple hosts through groups. Within a host, you can actually now remove snapins much easier. You can also choose to deploy a single snapin to a host immediately or all snapins; generally, I would assign snapins to a group (such as a room where only one piece of software is licensed for) and any hosts in that would receive this software when the image is initially deployed, with the snapins then automatically deployed post-image. However, in the case of the example used here, Questionmark is a piece of software that we have, relatively late-on, been tasked with installing for another department in the University. Automation of certain uninstallations and updates should also be possible this way too – hopefully in a future update, I’ll be able to talk about making custom modules for FOG or any ways in which snapins are tweaked and deployed further.

But so far, FOG 1.2 seems to be running absolutely fine!

Recovering a deleted fog user – or – resetting a user’s password

Today is off to a frustrating start. I couldn’t log into the web UI anymore – it kept telling me I had an invalid password. I rembered a while ago that I had a problem with my username – it turns out that this was because I had deleted it when trying to rename it; however, here I had simply changed the password from the default (which is ‘password’!).

I logged into the database and decided to check it out for myself to see what was going on. To do this, I just typed:

sudo mysql -u root -p fog

That prompts me for my password and opens up the “fog” database. You can then see all the tables using just SQL statements like SHOW tables; but make sure you add a semi colon afterwards (and not just because, here, it makes grammatical sense to).

Anyhow, if I do a SELECT * from users; statement, it should show up all my users – and there should be two; one for quick imaging and one for my fog user.

sql1

…and there isn’t. WHAT’S WRONG WITH YOU, FOG? I think I said that out loud, too. Ok, no big deal – we can just add another one. Because I can’t be bothered to type out too much, and beause the uId gets added automatically, I just want to insert a new user with the name “fog” and, to make sure it is a proper user that can do everything, I set the type to 0 (1 = the inability to do anything except for performing quick image tasks).

sql2

Ok, so the user is back now – all I have to then do is update that user by setting the password to whatever I want (in single ‘quotes’) and encrypt it with MD5.

sql3

And it works! I can log in again finally. Hopefully it won’t keep doing this – if so, I’ll write and rant more about it and see if I can figure out why!

Looking to install FOG 1.2.0 from scratch? Check out my more updated post.

Yesterday was mostly spent backing up the original FOG installation just in case this all goes horribly wrong (we do actually need to use this server, after all). This was taking absolutely forever, so I gave up and only backed up the server minus the images that I made over the last year or so (except the default one we are currently using). Plus, I wanted to actually get on and install this thing this week! I headed over to pendrivelinux.com and downloaded a cool piece of software that allowed me to install Linux to a USB stick to boot from. Absolutely fine, except the Dell PowerEdge 2950 server I was using doesn’t seem to do absolutely anything when I put the pendrive in. It just refuses to boot from it and, even after messing with the partition table, it eventually gave up and said invalid operating system.Plan B today, therefore, was to put in a CD (from an external CD drive). This worked fine and after a few reinstalls (the first time I messed up the RAID configuration and the second time I forgot to add a user or set a password for root), everything else has so far gone fine. It is a very barebone linux system – the only two things I did was to upgrade/update the system and packages and to install ssh for remote access from my office (its a lovely cold room, but I can’t spent all day up there. Well, on a hot day, I very well could..).

The next phase is to install FOG using this guide [edit: you may want to use this guide instead]. It seems to be one of the most unbelievably straight-forward processes – although the network has already been setup for our FOG server already. Except that, now, it seems that the original pxelinux.0 has now become undionly.kpxe. pxelinux.0 was the bootfile that is specified in some DHCP environments’ option 67 setting. The three switches that I set up to support our networking labs have this set, meaning now I will probably have to change it at some point. I’ll writeup about that next time, or perhaps it will just work. The FOG community will know, either way (and for anyone who is thinking of using FOG, it has a very active and helpful forum). Finally, after all of the LAMP stuff had been set up, I was told to go to the web management page. Luckily, as the IP and hostname are all the same still, I just had to open my browser and navigate to my FOG page tab.

newfog

New FOG! Yay!

So first impressions are that this is great – actually installing a server from scratch and putting FOG onto it took perhaps a couple of hours tops (including burning a CD and downloading Ubuntu 14). It looks very much like 0.32 looked, so I know where everything is.

newfog2

I still had a tab open from earlier today, when the server was still running. Useful, as I can just copy-paste all of the settings from one tab to the other 

Configuration from now on can be done through the browser for the most part. The settings look pretty much the same as 0.32 looked – although, sneaked in, is the option to donate CPU time to mine cryptocurrency for FOG donations. Kind of inefficient to be honest – the power and heat consumption will cost far more than you would save by donating probably half as much in cash, with Bitcoin’s current difficulty rate. But its a nice idea; especially as, at the time of writing, there are supposedly ~4500 FOG servers – and each one could have many hundred hosts all performing mining tasks.

Configuration aside, there is just one problem now – the FOG database. Everything is gone – all registered hosts and groups. To be honest, this isn’t really too big an issue as I think it is probably time to go around and inventory our computers again. There are several computers that have been moved, changed or that we don’t use anymore – plus it gives us a chance to check out every computer manually and verify that network/video/keyboard connectivity works.

So that was quite easy – so far – the next step is to test out what happens with network booting (and IT services are the ones that are in control of the majority of the network infrastructure here and therefore are the ones who set the DHCP options). Stay tuned! (like anyone is reading at this point!)

Edit 12/09 – You can also go ahead and delete /var/www/html/index.html which will then cause index.php to be loaded, thus redirecting you automatically to the fog/management page. 


 

 

 

No comments:

Post a Comment