PhD Studies PC - Database+R
Discussion
I am looking for a PC for my studies. I am undecided which would be the better workflow.
My course is a PhD in Data Analytics. I have access to some decent Servers at work and get access via VPN. But I am unable to directly synchronize the data between home and office Machines.
My choices are below
Option 1 - All Data is stored & Processed on Laptop
Dell XPS 9560 Laptop(1TB SSD/32Gig Ram
Or
Option 2. Data processing undertaken on Server and then Summary Results pushed to laptop SQL database
Used Dell R610 Server + Use my existing Laptop
Dell Server will have 128Gig Ram / 1.2 TB Intel PCIe SSD + 8x 600Gig SAS Drives + 2x Xeon X5650 (3 year old used hardware)
My Existing database size is small at around 500 Gig expected to grow to 800 Gig. I am using R to do the analysis.
I am trying to figure out which is best for workflow
My course is a PhD in Data Analytics. I have access to some decent Servers at work and get access via VPN. But I am unable to directly synchronize the data between home and office Machines.
My choices are below
Option 1 - All Data is stored & Processed on Laptop
Dell XPS 9560 Laptop(1TB SSD/32Gig Ram
Or
Option 2. Data processing undertaken on Server and then Summary Results pushed to laptop SQL database
Used Dell R610 Server + Use my existing Laptop
Dell Server will have 128Gig Ram / 1.2 TB Intel PCIe SSD + 8x 600Gig SAS Drives + 2x Xeon X5650 (3 year old used hardware)
My Existing database size is small at around 500 Gig expected to grow to 800 Gig. I am using R to do the analysis.
I am trying to figure out which is best for workflow
Edited by ThePlanner on Thursday 23 February 07:27
SQL server on microsoft's cloud?
Or a machine at your work that you can remote desktop into from home.
That would allow you to leave analysis tasks running while you travelled and to still have access to full bandwidth access to your data when you're at home
Or a machine at your work that you can remote desktop into from home.
That would allow you to leave analysis tasks running while you travelled and to still have access to full bandwidth access to your data when you're at home
Edited by nyt on Thursday 23 February 06:56
You will need to comply with the institutional and PhD-funder's Research Data Management policies and these will preclude using a home PC or laptop for the prime storage. Depending on the data they could preclude using any off-site/cloud facility too.
Option 1 should be out, although it is the commonly used approach.
Option 2 is better.
Better would be to SSH-in to an institutional resource for al processing and use the laptop for preparation of figures, reports, etc.
Better still is to do all the work at the institution and use the time away for other stuff.
Option 1 should be out, although it is the commonly used approach.
Option 2 is better.
Better would be to SSH-in to an institutional resource for al processing and use the laptop for preparation of figures, reports, etc.
Better still is to do all the work at the institution and use the time away for other stuff.
Edited by V8LM on Thursday 23 February 07:00
HappyMidget said:
Also, using sql2016 and clustered columnstore indexing will massively compress the data. In my last data warehouse, 2bn rows compressed down to about 80GB from over 500GB.
The data is already in SQL 2016 and has been optimized. Du1point8 said:
Shame its SQL.
What kind of data and analysis?
It it Time based data or normal data?
The main Element of my Research is Predictive Analytics of the data. What kind of data and analysis?
It it Time based data or normal data?
The Data contains multiple elements
- Weather
- vehicle - Vehicle, Person, Location, Speed
- Sites of Interest
V8LM said:
You will need to comply with the institutional and PhD-funder's Research Data Management policies and these will preclude using a home PC or laptop for the prime storage. Depending on the data they could preclude using any off-site/cloud facility too.
Option 1 should be out, although it is the commonly used approach.
Option 2 is better.
Better would be to SSH-in to an institutional resource for al processing and use the laptop for preparation of figures, reports, etc.
Better still is to do all the work at the institution and use the time away for other stuff.
The Data is provided by my employer and there are no restrictions to where I store the data, except that it not online! The project sponsor is a overseas government so not subject to Data Protection Laws. I have an agreed reporting structure for the final Thesis of what data can and cannot be presented. Option 1 should be out, although it is the commonly used approach.
Option 2 is better.
Better would be to SSH-in to an institutional resource for al processing and use the laptop for preparation of figures, reports, etc.
Better still is to do all the work at the institution and use the time away for other stuff.
Edited by V8LM on Thursday 23 February 07:00
This has been agreed between the University in UK and the Government Department.
Du1point8 said:
Shame its SQL.
What kind of data and analysis?
It it Time based data or normal data?
At the moment it is SQL, as it is a direct duplicate of the data from my employer. It does not stop me from transferring to another platform, If required. What kind of data and analysis?
It it Time based data or normal data?
Edited by ThePlanner on Thursday 23 February 07:20
Edited by ThePlanner on Thursday 23 February 07:26
Vaud said:
How do they define "not online"?
Not stored in Amazon Cloud or SimilarWe have invested $5 Million last year in out own internal Data/Compute Store for the department. So I have access to this but I do not want to have my research work running alongside the production environment.
WE have 50x Cisco USC Server + a 360TB EMC SAN Storage
800 Cores (1600 Threads) + 12800 Gig Ram in Total
40x Servers are used as a Hadoop Cluster the remaining 10 are used as 10 Windows VMs for ease of access of the data for other users in SQL + ESRI GIS
Early System Design Things were changed after initial System Testing.
Hardware installed in Server Room
plasticpig said:
Probably teaching you to suck eggs but isn't the laptop out of the equation purely due to DB size? It depends on exactly what you are doing I guess but how big is the TempDB going to grow to on an 800GB DB?
I didn't think about the TempDB growing. Never had to worry about space restrictions before. But thanks could be an issue on laptop. So looks like Used Server + Transferring of summary data to laptop.
Are you looking at virtualisation it on the laptop or running it natively? I personally prefer using virtualisation as its easier to back up the entire thing and I can always port it to different hardware if needed. Granted there is a performance penalty paid by not running it natively as you have a "wrapper" around the application but the benefits out way that for me.
A decent workstation class laptop is more than capable of running sizable virtual machines for this: http://www8.hp.com/us/en/workstations/zbook-15.htm...
I have a similar laptop that I use for demos/testing (can't connect to the network in a lot of places I work) that can quite happily run five virtual servers performing a reasonable workload.
The biggest initial bottleneck with any laptop running virtual servers is the hard disks, I use two high performance SSDs built into the laptop and have the drives of my virtual servers split over the two drives rather than running them as RAID to reduce contention. I found this works better than RAID for me, your mileage may vary. You could also add in a 3rd or 4th drive using a decent external SSD drive plugged in via USB.
A decent workstation class laptop is more than capable of running sizable virtual machines for this: http://www8.hp.com/us/en/workstations/zbook-15.htm...
I have a similar laptop that I use for demos/testing (can't connect to the network in a lot of places I work) that can quite happily run five virtual servers performing a reasonable workload.
The biggest initial bottleneck with any laptop running virtual servers is the hard disks, I use two high performance SSDs built into the laptop and have the drives of my virtual servers split over the two drives rather than running them as RAID to reduce contention. I found this works better than RAID for me, your mileage may vary. You could also add in a 3rd or 4th drive using a decent external SSD drive plugged in via USB.
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff