PhD Studies PC - Database+R

Author
Discussion

ThePlanner

Original Poster:

5,252 posts

268 months

Thursday 23rd February 2017
quotequote all
I am looking for a PC for my studies. I am undecided which would be the better workflow.

My course is a PhD in Data Analytics. I have access to some decent Servers at work and get access via VPN. But I am unable to directly synchronize the data between home and office Machines.

My choices are below

Option 1 - All Data is stored & Processed on Laptop
Dell XPS 9560 Laptop(1TB SSD/32Gig Ram

Or

Option 2. Data processing undertaken on Server and then Summary Results pushed to laptop SQL database
Used Dell R610 Server + Use my existing Laptop
Dell Server will have 128Gig Ram / 1.2 TB Intel PCIe SSD + 8x 600Gig SAS Drives + 2x Xeon X5650 (3 year old used hardware)

My Existing database size is small at around 500 Gig expected to grow to 800 Gig. I am using R to do the analysis.

I am trying to figure out which is best for workflow





Edited by ThePlanner on Thursday 23 February 07:27

ThePlanner

Original Poster:

5,252 posts

268 months

Thursday 23rd February 2017
quotequote all
HappyMidget said:
Also, using sql2016 and clustered columnstore indexing will massively compress the data. In my last data warehouse, 2bn rows compressed down to about 80GB from over 500GB.
The data is already in SQL 2016 and has been optimized.

Du1point8 said:
Shame its SQL.

What kind of data and analysis?

It it Time based data or normal data?
The main Element of my Research is Predictive Analytics of the data.

The Data contains multiple elements
  • Weather
  • vehicle - Vehicle, Person, Location, Speed
  • Sites of Interest

V8LM said:
You will need to comply with the institutional and PhD-funder's Research Data Management policies and these will preclude using a home PC or laptop for the prime storage. Depending on the data they could preclude using any off-site/cloud facility too.

Option 1 should be out, although it is the commonly used approach.

Option 2 is better.

Better would be to SSH-in to an institutional resource for al processing and use the laptop for preparation of figures, reports, etc.

Better still is to do all the work at the institution and use the time away for other stuff.

Edited by V8LM on Thursday 23 February 07:00
The Data is provided by my employer and there are no restrictions to where I store the data, except that it not online! The project sponsor is a overseas government so not subject to Data Protection Laws. I have an agreed reporting structure for the final Thesis of what data can and cannot be presented.

This has been agreed between the University in UK and the Government Department.

Du1point8 said:
Shame its SQL.

What kind of data and analysis?

It it Time based data or normal data?
At the moment it is SQL, as it is a direct duplicate of the data from my employer. It does not stop me from transferring to another platform, If required.

Edited by ThePlanner on Thursday 23 February 07:20


Edited by ThePlanner on Thursday 23 February 07:26

ThePlanner

Original Poster:

5,252 posts

268 months

Thursday 23rd February 2017
quotequote all
Vaud said:
How do they define "not online"?
Not stored in Amazon Cloud or Similar

We have invested $5 Million last year in out own internal Data/Compute Store for the department. So I have access to this but I do not want to have my research work running alongside the production environment.

WE have 50x Cisco USC Server + a 360TB EMC SAN Storage
800 Cores (1600 Threads) + 12800 Gig Ram in Total
40x Servers are used as a Hadoop Cluster the remaining 10 are used as 10 Windows VMs for ease of access of the data for other users in SQL + ESRI GIS

Early System Design Things were changed after initial System Testing.


Hardware installed in Server Room

ThePlanner

Original Poster:

5,252 posts

268 months

Thursday 23rd February 2017
quotequote all
plasticpig said:
Probably teaching you to suck eggs but isn't the laptop out of the equation purely due to DB size? It depends on exactly what you are doing I guess but how big is the TempDB going to grow to on an 800GB DB?
I didn't think about the TempDB growing. Never had to worry about space restrictions before. But thanks could be an issue on laptop.

So looks like Used Server + Transferring of summary data to laptop.

ThePlanner

Original Poster:

5,252 posts

268 months

Thursday 23rd February 2017
quotequote all
This will be Fun.. Will need to transport server (35kg +) on my return flight next month.

Nothing available locally!