The problem: User data accumulates over time. There are many different types of wasted data on a typical company’s network shares. Here are the top four in no particular order:

  1. Users who make a copy of everything as a backup “just in case”. These users will create a copy every time they revise a document. It may not be bad when dealing with a 50K word processing text, but make that a 20MB PowerPoint or a much larger engineering drawing and we are looking at serious wasted disk.
  2. Users who do not trust IT. These are users who, possibly having experienced problems in the past, decide to copy their data to every network drive available to them. Say they have a home directory. They use this to back up everything out of a common directory and everything they touch off of an engineering share. Maybe they use this to fulfill #1 above.
  3. Unauthorized file types. Some users will not listen no matter how many times they are told. No MP3 files on the network, no home videos and personal picture albums, no joke videos downloaded from the net. Etc.
  4. Data no longer relevant. Creating data is easy, but knowing when or if it is safe to delete is a process that requires thought and is not required to get ones job done. What is the incentive for an employee to ever go behind and clean up?

The solution:

  1. User education. Users need to understand the architecture to some degree. It makes no sense to have three copies of the same file on the same SAN, backed up to the same disk, tape, etc. It is no more secure than a single file.
  2. The network drives also must remain consistent. It is easy and, sometimes we as administrators feel, the right thing, to only take down the minimum amount of resources necessary to get a job done. If the users see the home drive down less frequently than a common drive, they will tend to copy data they might just need on their home drive to guard against losing access during these maintenance events. When possible, take everything (or nothing) down for the users when maintenance has to be done.
  3. There are two semi-solutions to unauthorized file times. A manual search and delete, or an automated server-based preventative software. Try to find one that is reasonably priced though. It is important to engage Human Resources and Legal if you are serious about this though. This type of data puts your company at risk.
  4. Irrelevant data is always going to be a manual process. Maintaining tight quotas is the only semi-solution for this. If users perceive they are about to run out of space, and the barrier to getting more space is more than a simple phone call, then they are more likely to look at what they can do prior to demanding more space.

General solutions or best practices:

  1. Installing deduplication software or using a back end deduplication solution will transparently solve #1 and #2.
  2. Storage management software is a good solution for new data stores to allow users to define an expiration date for data retention.

I would love to have someone suggest or describe successful software packages that they used to solve or help with any of these items.

Leave a Reply


CommentLuv badge

Comments links could be nofollow free.